A vector of 16S sequences (DNA) are classified by first using BLAST blastn
against
a database of 16S DNA sequences, and then classify according to the nearest-neighbour principle.
The nearest neighbour of a query sequence is the hit with the largest bitscore. The blast+
software https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Download
must be installed on the system. Type system("blastn -help")
in the Console window,
and a sensible Help-text should appear.
The database must contain 16S sequences where the Header starts with a token specifying the taxon.
More specifically, the tokens must look like:
<taxon>_1
<taxon>_2
...etc
where <taxon> is some proper taxon name. Use blastDbase16S
to make such databases.
The identity of each alignment is also computed. This should be close to 1.0 for a classification
to be trusted. Identity values below 0.95 could indicate uncertain classifications, but this will
vary between taxa.