Learn R Programming

qrqc - Quick Read Quality Control

qrqc and all supporting documentation Copyright (c) Vince Buffalo, 2011-2012

Contact: Vince Buffalo vsbuffaloAAAAA@gmail.com (with the poly-A tail removed)

If you wish to report a bug, please open an issue on Github (http://github.com/vsbuffalo/qrqc/issues) or post it on the Bioconductor support site (https://support.bioconductor.org/). You can contact me personally as well, but please open an issue first.

About

qrqc (short for "Quick Read Quality Control") is a fast and extensible package that reports basic quality and summary statistics on FASTQ and FASTA files, including base and quality distribution by position, sequence length distribution, and common sequences.

License

GNU General Public License, version 2.

FAQ

Why ggplot2?

I've had some feature requests for qrqc since its release, mostly related to customizing the graphics. Since data accessibility and custom graphics were the reason I created qrqc, I initially rewrote qrqc to provide more graphics options through lattice. However, all the graphics parameters I added led to large numbers of arguments to functions and high complexity. This rewrite uses ggplot2, which is a very excellent way to create graphics as any graphics object can be further manipulated.

Why do you use Monte Carlo simulations to generate the smooth curve?

qrqc is fast because it bins the quality scores of bases by positions; there is data summarization done by readSeqFile. To create a smooth curve, the function needs multiple data points (not binned data), which I simulate via Monte Carlo draws from the quality distribution by position. This is an approximation, but it leads to a smooth curve which can create a useful visual tool in assessing quality drops.

What do I do about bad quality regions?

Illumina reads often have poor 3'-end qualities. I've noticed that HiSeq machines also produce poor quality 5'-ends. For increased mapping rates and better assmeblies, it is generally advisable that these poor quality regions be trimmed off. Nik Joshi's took sickle tool can do this; you can get it here http://github.com/najoshi/sickle.

3'-end adapter contamination can be difficult to recognize (and thus remove) due to poor quality and likely incorrect bases. I've developed a tool called scythe that removes

Copy Link

Version

Version

1.26.0

License

GPL (>=2)

Issues

Pull Requests

Stars

Forks

Maintainer

Vince Buffalo

Last Published

February 15th, 2017

Functions in qrqc (1.26.0)

qualPlot-methods

Plot a Base Quality Boxplot by Position
list2df

Apply a function to items in list and combine into data frame
getGC-methods

Get a Data Frame of GC Content from a SequenceSummary object
geom_qlinerange

Use Line Segments and Points to Plot Quality Statistics by Position in the Read
getBase-methods

Get a Data Frame of Base Frequency Data from a SequenceSummary Object
getKmer-methods

Get a Data Frame of k-mer Frequency by Position from a SequenceSummary Object
plotBases-methods

Plot Bases by Position
kmerKLPlot-methods

Plot K-L Divergence Components for a Subset of k-mers to Inspect for Contamination
getQual-methods

Get a Data Frame of Quality Data from a FASTQSummary object
plotSeqLengths-methods

Plot Histogram of Sequence Lengths
plotGC-methods

Plot per Base GC Content by Position
getSeqlen-methods

Get a Data Frame of Sequence Lengths from a SequenceSummary object
seqlenPlot-methods

Plot a Histogram of Sequence Lengths
scale_color_dna

Set the color scheme to biovizBase's for DNA
readSeqFile

Read and Summarize a Sequence (FASTA or FASTQ) File
plotQuals-methods

Plot a Base Quality Boxplot by Position
makeReport-methods

Make an HTML report from a FASTASummary of FASTQSummary object
gcPlot-methods

Plot GC Content by Position
kmerEntropyPlot-methods

Plot Entropy of k-mers by Position
getMCQual-methods

Get a Data Frame of Simulated Qualitied from a FASTQSummary object
getBaseProp-methods

Get a Data Frame of Base Proportion Data from a SequenceSummary object
calcKL-methods

Calculate the Kullback-Leibler Divergence Between the k-mer Distribution by Position and the k-mer Distribution Across All Positions.
FASTASummary-class

FASTASummary class representing the summaries of a FASTA file
FASTQSummary-class

FASTQSummary class representing the summaries of a FASTQ file
SequenceSummary-class

SequenceSummary class representing the summaries of a sequence file
basePlot-methods

Plot Base Frequency or Proportion by Position
scale_color_iupac

Set the color scheme to biovizBase's for IUPAC codes