ShortRead
package are leveraged to do this filtering.
fastqFilter(fn, fout, truncQ = 2, truncLen = 0, trimLeft = 0, maxN = 0, minQ = 0, maxEE = Inf, rm.phix = FALSE, n = 1e+06, compress = TRUE, verbose = FALSE, ...)
compress=TRUE
) the output fastq file is gzipped.truncQ
.
The default value of 2 is a special quality score indicating the end of good quality
sequence in Illumina 1.8+.truncLen
bases. Reads shorter than this are discarded.
Note that dada
currently requires all sequences to be the same length.truncLen
and
trimLeft
are provided, filtered reads will have length truncLen-trimLeft
.maxN
Ns will be discarded.
Note that dada
currently does not allow Ns.Inf
(no EE filtering).
After truncation, reads with higher than maxEE "expected errors" will be discarded.
Expected errors are calculated from the nominal definition of the quality score: EE = sum(10^(-Q/10))isPhiX
.1e6
, one-million reads. See FastqStreamer
for details.isPhiX
.fastqFilter
replicates most of the functionality of the fastq_filter command in usearch
(http://www.drive5.com/usearch/manual/cmd_fastq_filter.html). It adds the ability to remove
contaminating phiX sequences as part of the filtering process.
fastqPairedFilter
testFastq = system.file("extdata", "sam1F.fastq.gz", package="dada2")
filtFastq <- tempfile(fileext=".fastq.gz")
fastqFilter(testFastq, filtFastq, maxN=0, maxEE=2)
fastqFilter(testFastq, filtFastq, trimLeft=10, truncLen=200, maxEE=2, verbose=TRUE)
Run the code above in your browser using DataLab