Learn R Programming

microclass (version 1.2)

blastClassify16S: Classifying using BLAST

Description

A 16S based classification based on BLAST.

Usage

blastClassify16S(sequence, bdb)

Arguments

sequence

Character vector of 16S sequences to classify.

bdb

Name of BLAST data base, see blastDbase16S.

Value

A data.frame with two columns: Taxon is the predicted taxon for each sequence and Identity is the corresponding identity-value. If no BLAST hit is seen, the sequence is "unclassified".

Details

A vector of 16S sequences (DNA) are classified by first using BLAST blastn against a database of 16S DNA sequences, and then classify according to the nearest-neighbour principle. The nearest neighbour of a query sequence is the hit with the largest bitscore. The blast+ software https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Download must be installed on the system. Type system("blastn -help") in the Console window, and a sensible Help-text should appear.

The database must contain 16S sequences where the Header starts with a token specifying the taxon. More specifically, the tokens must look like:

<taxon>_1

<taxon>_2

...etc

where <taxon> is some proper taxon name. Use blastDbase16S to make such databases.

The identity of each alignment is also computed. This should be close to 1.0 for a classification to be trusted. Identity values below 0.95 could indicate uncertain classifications, but this will vary between taxa.

See Also

blastDbase16S.

Examples

Run this code
# NOT RUN {
data("small.16S")
# }
# NOT RUN {
dbase <- blastDbase16S("test", small.16S$Sequence, word(small.16S$Header, 2, 2))
reads <- str_sub(small.16S$Sequence, 100, 550)
blastClassify16S(reads, dbase) %>% 
  bind_cols(small.16S) -> tbl
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab