Learn R Programming

microclass (version 1.2)

multinomClassify: Classifying with a Multinomial model

Description

Classifying sequences by a trained Multinomial model.

Usage

multinomClassify(sequence, trained.model, post.prob = FALSE, prior = FALSE)

Arguments

sequence

Character vector of 16S sequences to classify.

trained.model

A list with a trained model, see multinomTrain.

post.prob

Logical indicating if posterior log-probabilities should be returned.

prior

Logical indicating if classification should be done by flat priors (default) or with empirical priors (prior=TRUE).

Value

If post.prob=FALSE a character vector of predicted taxa is returned.

If post.prob=TRUE a data.frame with three columns is returned. Taxon is the vector of predicted taxa, one for each sequence in sequence. The Post.prob.1 and Post.prob.2 are vectors with the maximum and second largest posterior log-probabilities for each sequence.

Details

The classification step of the Multinomial method (Vinje et al, 2015) means counting K-mers on all sequences, and computing the posterior probabilities for each taxon in the trained model. The predicted taxon for each input sequence is the one with the maximum posterior probability for that sequence.

By setting post.prob=TRUE you will get the log-probability of the best and second best taxon for each sequence. This can be used for evaluating the certainty in the classifications, see taxMachine.

The classification is parallelized through RcppParallel employing Intel TBB and TinyThread. By default all available processing cores are used. This can be changed using the function setParallel.

References

Vinje, H, Liland, KH, Alm<U+00F8>y, T, Snipen, L. (2015). Comparing K-mer based methods for improved classification of 16S sequences. BMC Bioinformatics, 16:205.

See Also

KmerCount, multinomTrain.

Examples

Run this code
# NOT RUN {
data("small.16S")
seq <- small.16S$Sequence
tax <- sapply(strsplit(small.16S$Header,split=" "),function(x){x[2]})
# }
# NOT RUN {
trn <- multinomTrain(seq,tax)
primer.515f <- "GTGYCAGCMGCCGCGGTAA"
primer.806rB <- "GGACTACNVGGGTWTCTAAT"
reads <- amplicon(seq, primer.515f, primer.806rB)
predicted <- multinomClassify(unlist(reads[nchar(reads)>0]),trn)
print(predicted)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab