dada(derep, err, errorEstimationFunction = loessErrfun, selfConsist = FALSE, pool = FALSE, ...)
derep-class
object, the output of derepFastq
.
A list of such objects can be provided, in which case each will be denoised with a shared error model. The matrix of estimated rates for each possible nucleotide transition (from sample nucleotide to read nucleotide).
Rows correspond to the 16 possible transitions (t_ij) indexed such that 1:A->A, 2:A->C, ..., 16:T->T
Columns correspond to quality scores. Typically there are 41 columns for the quality scores 0-40.
However, if USE_QUALS=FALSE, the matrix must have only one column.
If selfConsist = TRUE, err
can be set to NULL and an initial error matrix will be estimated from the data
by assuming that all reads are errors away from one true sequence.
loessErrfun
. If USE_QUALS = TRUE, errorEstimationFunction(dada()$trans_out)
is computed after sample inference,
and the return value is used as the new estimate of the err matrix in $err_out.
If USE_QUALS = FALSE, this argument is ignored, and transition rates are estimated by maximum likelihood (t_ij = n_ij/n_i).
logical(1)
. Default FALSE. If selfConsist = TRUE, the algorithm will alternate between sample inference and error rate estimation
until convergence. Error rate estimation is performed by errorEstimationFunction
.
If selfConsist=FALSE the algorithm performs one round of sample inference based on the provided err
matrix.
logical(1)
. Default is FALSE. If pool = TRUE, the algorithm will pool together all samples prior to sample inference.
If pool = FALSE, sample inference is performed on each sample individually.
This argument has no effect if only 1 sample is provided, and pool
does not affect
error rates, which are always estimated from pooled observations across samples.
setDadaOpt
for a full list and description of these options.dada-class
object or list of such objects if a list of dereps was provided.
dada
implements a statiscal test for the notion that a specific sequence was seen too many times
to have been caused by amplicon errors from currently inferred sample sequences. Overly-abundant
sequences are used as the seeds of new clusters of sequencing reads, and the final set of clusters
is taken to represent the denoised composition of the sample. A more detailed explanation of the algorithm
is found in two publications:
dada
depends on a parametric error model of substitutions. Thus the quality of its sample inference is affected
by the accuracy of the estimated error rates. selfConsist
mode allows these error rates to be inferred
from the data.
All comparisons between sequences performed by dada
depend on pairwise alignments. This step is the most
computationally intensive part of the algorithm, and two alignment heuristics have been implemented for speed:
A kmer-distance screen and banded Needleman-Wunsch alignmemt. See setDadaOpt
.
derepFastq
, setDadaOpt
derep1 = derepFastq(system.file("extdata", "sam1F.fastq.gz", package="dada2"))
derep2 = derepFastq(system.file("extdata", "sam2F.fastq.gz", package="dada2"))
dada(derep1, err=tperr1)
dada(list(sam1=derep1, sam2=derep2), err=tperr1, selfConsist=TRUE)
dada(derep1, err=inflateErr(tperr1,3), BAND_SIZE=32, OMEGA_A=1e-20)
Run the code above in your browser using DataLab