This function maps taxonomic names to the RefSeq (NCBI) or GTDB taxonomy.
taxacounts
should be a data frame generated by either read_RDP
or ps_taxacounts
.
Input names are made by combining the taxonomic rank and name with an underscore separator (e.g. genus_ Escherichia/Shigella).
Input names are then matched to the taxa listed in taxon_AA.csv.xz
found under extdata/RefSeq
or extdata/GTDB
.
The protein
and organism
columns in these files hold the rank and taxon name extracted from the RefSeq or GTDB database.
Only exactly matching names are automatically mapped.
For mapping to the RefSeq (NCBI) taxonomy, some group names are manually mapped as follows (see Dick and Tan, 2023):
Input (i.e., RDP) | RefSeq |
genus_Escherichia/Shigella | genus_Escherichia |
phylum_Cyanobacteria/Chloroplast | phylum_Cyanobacteria |
genus_Marinimicrobia_genera_incertae_sedis | species_Candidatus Marinimicrobia bacterium |
class_Cyanobacteria | phylum_Cyanobacteria |
genus_Spartobacteria_genera_incertae_sedis | species_Spartobacteria bacterium LR76 |
class_Planctomycetacia | class_Planctomycetia |
class_Actinobacteria | phylum_Actinobacteria |
order_Rhizobiales | order_Hyphomicrobiales |
genus_Gp1 | genus_Acidobacterium |
genus_Gp6 | genus_Luteitalea |
genus_GpI | genus_Nostoc |
genus_GpIIa | genus_Synechococcus |
genus_GpVI | genus_Pseudanabaena |
family_Family II | family_Synechococcaceae |
genus_Subdivision3_genera_incertae_sedis | family_Verrucomicrobia subdivision 3 |
order_Clostridiales | order_Eubacteriales |
family_Ruminococcaceae | family_Oscillospiraceae |
To avoid manual mapping, GTDB can be used for both taxonomic assignemnts and reference proteomes.
Taxonomic assignments can be made using the RDP Classifier with this GTDB SSU training set: tools:::Rd_expr_doi("10.5281/zenodo.7633100") or dada2 with this GTDB training set: tools:::Rd_expr_doi("10.5281/zenodo.6655692").
Example files created using the RDP Classifier are provided under extdata/RDP-GTDB
.
An example dataset created with DADA2 is data(mouse.GTDB)
; this is a phyloseq-class
object that can be processed with functions described at physeq
.
Change quiet
to TRUE to suppress printing of messages about manual mappings, most abundant unmapped groups, and overall percentage of mapped names.