ShortRead
package are leveraged to do this filtering. The filtered forward/reverse reads
remain identically ordered.
fastqPairedFilter(fn, fout, maxN = c(0, 0), truncQ = c(2, 2), truncLen = c(0, 0), trimLeft = c(0, 0), minQ = c(0, 0), maxEE = c(Inf, Inf), rm.phix = c(FALSE, FALSE), matchIDs = FALSE, id.sep = "\\s", id.field = NULL, n = 1e+06, compress = TRUE, verbose = FALSE, ...)
character(2)
naming the paths to the (forward,reverse) fastq files.character(2)
naming the paths to the (forward,reverse) output files.
Note that by default (compress=TRUE
) the output fastq files are gzipped.FILTERING AND TRIMMING ARGUMENTS that follow can be provided as length 1 or length 2 vectors. If a length 1 vector is provided, the same parameter value is used for the forward and reverse sequence files. If a length 2 vector is provided, the first value is used for the forward reads, and the second for the reverse reads.
maxN
Ns will be discarded.
Note that dada
currently does not allow Ns.truncQ
.
The default value of 2 is a special quality score indicating the end of good quality
sequence in Illumina 1.8+.truncLen
bases. Reads shorter than this are discarded.
Note that dada
currently requires all sequences to be the same length.truncLen
and
trimLeft
are provided, filtered reads will have length truncLen-trimLeft
.Inf
(no EE filtering).
After truncation, reads with higher than maxEE "expected errors" will be discarded.
Expected errors are calculated from the nominal definition of the quality score: EE = sum(10^(-Q/10))isPhiX
.ID MATCHING ARGUMENTS that follow enforce matching between the sequence identification strings in the forward and reverse reads. The function can automatically detect and match ID fields in Illumina format, e.g: EAS139:136:FC706VJ:2:2104:15343:197393
matchIDs=FALSE
essentially assumes matching order between forward and reverse reads. If that
matched order is not present future processing steps may break (in particular mergePairs
).strsplit
.1e6
, one-million reads. See FastqStreamer
for details.isPhiX
.fastqFilter
testFastqF = system.file("extdata", "sam1F.fastq.gz", package="dada2")
testFastqR = system.file("extdata", "sam1R.fastq.gz", package="dada2")
filtFastqF <- tempfile(fileext=".fastq.gz")
filtFastqR <- tempfile(fileext=".fastq.gz")
fastqPairedFilter(c(testFastqF, testFastqR), c(filtFastqF, filtFastqR), maxN=0, maxEE=2)
fastqPairedFilter(c(testFastqF, testFastqR), c(filtFastqF, filtFastqR), trimLeft=c(10, 20),
truncLen=c(240, 200), maxEE=2, rm.phix=TRUE, verbose=TRUE)
Run the code above in your browser using DataLab