get_metrics: Calculate chemical metrics of community reference proteomes

Description

Combines taxonomic classifications with reference proteomes for taxa to get amino acid compositions of community reference proteomes. Amino acid compositions are used to calculate chemical metrics.

Usage

get_metrics(RDP, map, refdb = "GTDB_220", taxon_AA = NULL, groups = NULL,
    return_AA = FALSE, zero_AA = NULL, metrics = c("Zc", "nO2", "nH2O"))

Value

A data frame with one row for each sample, corresponding to columns 5 and above of RDP. The sample names are in the first column, which is named Run by default, or group if the groups argument is provided. The remaining columns have numeric values and are named for each of the calculated metrics.

Arguments

RDP: data frame, taxonomic abundances produced by read_RDP or ps_taxacounts
map: data frame, taxonomic mapping produced by map_taxa
refdb: character, name of reference database (GTDB_220 or RefSeq_206)
taxon_AA: data frame, amino acid compositions of taxa, used to bypass refdb specification
groups: list of indexing vectors, samples to be aggregated into groups
return_AA: logical, return the amino acid composition for each sample instead of the chemical metrics?
zero_AA: character, three-letter abbreviation(s) of amino acid(s) to assign zero counts for calculating chemical metrics
metrics: character, the chemical metrics to calculate

Details

get_metrics calculates selected chemical metrics for the community reference proteome in each sample. The community reference proteome is computed from the amino acid compositions of reference proteomes for taxa (obtained from the reference database in refdb), multiplied by taxonomic abundances given in RDP. RDP may include results from the RDP Classifier (read using read_RDP) or derived from the OTU table of a phyloseq-class object (see ps_taxacounts). map defines the taxonomic mapping between RDP and refdb. Then, chemical metrics are calculated from the amino acid composition of the community reference proteome. The default chemical metrics are carbon oxidation state (Z_C), stoichiometric oxidation state (nO₂), and stoichiometric hydration state (nH₂O). See calc_metrics for other available metrics.

groups, if given, is a list of one or more indexing vectors (with logical or numeric values) corresponding to samples whose taxonomic classifications are aggregated into groups before calculating amino acid compositions and chemical metrics.

Examples

Run this code

## First two examples are for RDP Classifier with default training set
## and mapping to NCBI taxonomy with RefSeq reference proteomes

# Get chemical metrics for all samples in a dataset
RDPfile <- system.file("extdata/RDP/BGPF13.tab.xz", package = "chem16S")
RDP <- read_RDP(RDPfile)
map <- map_taxa(RDP, refdb = "RefSeq_206")
# This is a data frame with 14 rows and Run, Zc, nO2, and nH2O columns
(metrics <- get_metrics(RDP, map, refdb = "RefSeq_206"))

# Read the metadata file
mdatfile <- system.file("extdata/metadata/BGPF13.csv", package = "chem16S")
# Create list with metadata and metrics in same sample order
mdat <- get_metadata(mdatfile, metrics)
# Calculate metrics for aggregated samples of Archaea and Bacteria
groups <- list(A = mdat$metadata$domain == "Archaea",
  B = mdat$metadata$domain == "Bacteria")
# This is a data frame with 2 rows and group, Zc, nO2, and nH2O columns
get_metrics(RDP, map, refdb = "RefSeq_206", groups = groups)

# Classifications were made using the RDP Classifer retrained with GTDB r220
RDPfile.GTDB <- system.file("extdata/RDP-GTDB_220/BGPF13.tab.xz", package = "chem16S")
RDP.GTDB <- read_RDP(RDPfile.GTDB)
# These use the default option of refdb = "GTDB_220"
map.GTDB <- map_taxa(RDP.GTDB)
metrics.GTDB <- get_metrics(RDP.GTDB, map.GTDB)

# Plot Zc from GTDB vs RefSeq
xylim <- range(metrics$Zc, metrics.GTDB$Zc)
plot(metrics$Zc, metrics.GTDB$Zc, xlim = xylim, ylim = xylim, type = "n")
lines(xylim, xylim, lty = 2, col = 8)
points(metrics$Zc, metrics.GTDB$Zc, pch = mdat$metadata$pch, bg = mdat$metadata$col)
md.leg <- mdat$metadata[1:2, ]
legend("bottomright", md.leg$domain, pch = md.leg$pch, pt.bg = md.leg$col)
title(quote(italic(Z)[C]~"from GTDB vs RefSeq"))

# To exclude tryptophan, tyrosine, and phenylalanine
# from the calculation of chemical metrics
get_metrics(RDP.GTDB, map.GTDB, zero_AA = c("Trp", "Tyr", "Phe"))