get_metric_byrank: Chemical metrics for taxa aggregated to a given rank

Description

Calculates a single chemical metric for taxa in each sample aggregated to a specified rank.

Usage

get_metric_byrank(RDP, map, refdb = "GTDB_220", taxon_AA = NULL,
    groups = NULL, zero_AA = NULL, metric = "Zc", rank = "genus")

Value

A data frame of numeric values with row names corresponding to samples and column names corresponding to taxa.

Arguments

RDP: data frame, taxonomic abundances produced by read_RDP or ps_taxacounts
map: data frame, taxonomic mapping produced by map_taxa
refdb: character, name of reference database (GTDB_220 or RefSeq_206)
taxon_AA: data frame, amino acid compositions of taxa, used to bypass refdb specification
groups: list of indexing vectors, samples to be aggregated into groups
zero_AA: character, three-letter abbreviation(s) of amino acid(s) to assign zero counts for calculating chemical metrics
metric: character, chemical metric to calculate
rank: character, amino acid compositions of all lower-ranking taxa (to genus) are aggregated to this rank

Details

This function adds up amino acid compositions of taxa up to the specified rank and returns a data frame samples on the rows and taxa on the columns. Because amino acid composition for genera have been precomputed from species-level genomes in a reference database, chemical metrics for genera are constant. In contrast, chemical metrics for higher-level taxa is variable as they depend on the reference genomes as well as relative abundances of children taxa.

The value for rank should be one of rootrank, domain, phylum, class, order, family, or genus. For all ranks other than genus, the amino acid compositions of all lower-ranking taxa are weighted by taxonomic abundance and summed in order to calculate the chemical metric at the specified rank. If the rank is genus, then no aggregation is done (because it is lowest-level rank available in the classifications), and the values of the metric for all genera in each sample are returned. If the rank is rootrank, then the results are equivalent to community reference proteomes (i.e., get_metrics).

The RDP, map, refdb, and groups arguments are the same as described in get_metrics. See calc_metrics for available metrics.

References

Dick JM, Shock E. 2013. A metastable equilibrium model for the relative abundances of microbial phyla in a hot spring. PLOS One 8: e72395. tools:::Rd_expr_doi("10.1371/journal.pone.0072395")

Examples

Run this code

# Plot similar to Fig. 1 in Dick and Shock (2013)
# Read example dataset
RDPfile <- system.file("extdata/RDP-GTDB_220/SMS+12.tab.xz", package = "chem16S")
RDP <- read_RDP(RDPfile)
# Get mapping to reference database
map <- map_taxa(RDP)
# Calculate phylum-level Zc
phylum_Zc <- get_metric_byrank(RDP, map, rank = "phylum")
# Keep phyla present in at least two samples
n_values <- colSums(!sapply(phylum_Zc, is.na))
phylum_Zc <- phylum_Zc[n_values > 2]
# Swap first two samples to get them in the right location
# (MG-RAST accession numbers for these samples are not in spatial order)
phylum_Zc <- phylum_Zc[c(2, 1, 3, 4, 5), ]
matplot(phylum_Zc, type = "b", xlab = "Sampling site (hot -> cool)", ylab = "Zc")
title("Phylum-level Zc at Bison Pool hot spring")

Run the code above in your browser using DataLab