rdpClassify: Classifying with the RDP classifier

Description

Classifying sequences by a trained presence/absence K-mer model.

Usage

rdpClassify(sequence, trained.model, post.prob = FALSE, prior = FALSE)

Arguments

sequence

Character vector of sequences to classify.

trained.model

A list with a trained model, see rdpTrain.

post.prob

Logical indicating if posterior log-probabilities should be returned.

prior

Logical indicating if classification should be done by flat priors (default) or with empirical priors (prior=TRUE).

Value

A character vector with the predicted taxa, one for each sequence.

Details

The classification step of the presence/absence method known as the RDP classifier (Wang et al 2007) means looking for K-mers on all sequences, and computing the posterior probabilities for each taxon using a trained model and a naive Bayes assumption. The predicted taxon is the one producing the maximum posterior probability, for each sequence.

The classification is parallelized through RcppParallel employing Intel TBB and TinyThread. By default all available processing cores are used. This can be changed using the function setParallel.

References

Wang, Q, Garrity, GM, Tiedje, JM, Cole, JR (2007). Naive Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy. Applied and Enviromental Microbiology, 73: 5261-5267.

Examples

Run this code

# NOT RUN {
data("small.16S")
seq <- small.16S$Sequence
tax <- sapply(strsplit(small.16S$Header,split=" "),function(x){x[2]})
# }
# NOT RUN {
trn <- rdpTrain(seq,tax)
primer.515f <- "GTGYCAGCMGCCGCGGTAA"
primer.806rB <- "GGACTACNVGGGTWTCTAAT"
reads <- amplicon(seq, primer.515f, primer.806rB)
predicted <- rdpClassify(unlist(reads[nchar(reads)>0]),trn)
print(predicted)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab