revisit: Diversity Calculations for Chemical Species

Description

Calculate species richness, or standard deviation, coefficient of variation or Shannon diversity index of activities or logarithms of activities of chemical species, and plot the results.

Usage

revisit(d, target = "cv", loga.ref = NULL,
    do.plot = NULL, col = par("fg"), yline = 2, ylim = NULL, 
    ispecies = NULL, add = FALSE, cex = par("cex"), lwd = par("lwd"), 
    mar = NULL, side = 1:4, xlim = NULL, labcex = 0.6, pch = 1, 
    legend = "", legend.x = NULL, lpch = NULL, main = NULL, 
    lograt.ref = NULL, plot.ext = TRUE)
  extremes(z, target)
  where.extreme(z, target, do.sat = FALSE)

Arguments

list, output from diagram, or list of logarithms of activities of species.

target

character, what statistic to calculate.

loga.ref

numeric, logarithm of activities for comparison statistics

do.plot

logical, make a plot?

col

character, color to use for points or liness.

yline

numeric, margin line for y-axis label.

ylim

numeric, limits of y axis.

ispecies

numeric, which species to consider.

add

logical, add to an existing plot?

cex

numeric, character expansion factor.

lwd

numeric, line width.

mar

numeric, plot margin specifications.

side

numeric, which sides of plot to draw axes.

xlim

numeric, limits of x axis.

labcex

numeric, character expansion factor for species labels.

pch

numeric, plotting symbol(s) to use for points.

legend

character, text to use for legend.

legend.x

character, placement of legend.

lpch

numeric, plotting symbol(s) to use in legend.

main

character, main title for plot.

lograt.ref

numeric, log10 of reference abundance ratios.

plot.ext

logical, show the location of the extreme value(s)?

numeric, matrix of values.

do.sat

logical, identify multiple extreme values.

Value

revisit returns a list containing at least an element named H giving the calculated values for the target statistic. This has the same dimensions as a single element of d (or d$logact, if d was the output from diagram). For calculations as a function of one or two variables, the output also contains the elements ix (location of the extremum in the first direction), x (x-value at the extremum), and extval (extreme value). For calculations as a function of two variables, the output also contains the elements iy (location of the extremum in the second direction) and y (y-value at the extremum).

Details

The purpose of richness is to calculate and visualize summary statistics for logarithms of activities of chemical species. For most uses, supply the output of diagram as the value for d. Alternatively, d can be a list of logarithms of activities; the list elements each correspond to a different species and can be vectors, matrices, or higher-dimensional arrays, but they must all have the same dimensions. (This is always the case for d$logact if d is the output from diagram; the dimensionality is determined by the number of variables used in the calculations of affinity.) The type of statistic to be calculations is indicated by target, as summarized in the following table.

llll{ target description extremum additional arguments sd standard deviation min none cv coefficient of variation min none shannon Shannon diversity index max none qqr correlation coefficient on q-q plot (normal distribution) max none richness species richness max loga.ref cvrmsd coefficient of variation of RMSD min loga.ref spearman Spearman correlation coefficient max loga.ref pearson Pearson correlation coefficient max loga.ref }

sd, cv, shannon and qqrr all operate on just the sample values. richness counts the numbers of species whose logarithms of activities are above log.min. cvrmsd, spearman and pearson are comparison statistics where loga.target represents the observed values. ratio determines the correlation coefficient of a predicted change in loga ratios (d$logact vs. loga.ref) plotted agains observed changed in loga ratios (e.g., from changes in protein expression deduced from microarray experiments; given in loga.target)

If do.plot is TRUE, d is the output from diagram, and the number of variables is 1 or 2, the results are plotted -- a line diagram in 1 dimension or a contour plot in 2 dimensions.

The value of extremum in the table shows whether the extreme value that optimizes the system is the minimum (sd, cv, cvrmsd) or the maximum (all the others). On plots the location of the extreme value is indicated (by a dashed vertical line on a 1-D plot or a point marked by an asterisk on a 2-D plot). On 2-D plots the valleys (or ridges) leading to the location of the extremum are plotted. The ridges or valleys are plotted as dashed lines and are colored green for the $x$ values returned by extremes and blue for the $y$ values returned by extremes.

The location of the extreme value in a matrix or vector z is calculated using where.extreme. Whether the extreme is the minimum or the maximum value depends on the value of target. For matrices, if do.sat is TRUE, if the extreme value is repeated, the row and columns numbers for all instances are returned. Given a matrix of numeric values in z, extremes locates the maximum or minimum values in both dimensions. That is, the $x$ values that are returned are the column numbers where the extreme is found for each row, and the $y$ values that are returned are the row numbers where the extreme is found for each column.

If lograt.ref is provided, these values are the reference values for logarithm of abundance ratio. The function name was changed from diversity to revisit in CHNOSZ-0.9 because there is a function named diversity in the vegan package. Note that while diversity takes a matrix with species on the columns, revisit takes a list with species as the elements of the list.

Examples

Run this code

data(thermo)
  ### using grep.file, read.fasta, add.protein
    # calculations for Pelagibacter ubique
    f <- system.file("extdata/fasta/HTCC1062.faa.xz",package="CHNOSZ")
    # what proteins to select (set to "" for all proteins)
    w <- "ribosomal"
    # locate entries whose names contain w
    j <- grep.file(f,w)
    # get the amino acid compositions of these protein
    p <- read.fasta(f,j)
    # add these proteins to CHNOSZ's inventory
    i <- add.protein(p)
    # set up a the chemical system
    basis("CHNOS+")
    # calculate affinities of formation in logfO2 space
    a <- affinity(O2=c(-90,-60),iprotein=i)
    # show the equilibrium activities
    d <- diagram(a,cex=1.5,logact=0)
    # make a title
    expr <- as.expression(substitute(x~y~"proteins in"~
      italic("P. ubique"),list(x=length(j),y=w)))
    mtitle(c("Equilibrium activities of",expr),cex=1.5)
    # show the coefficient of variation
    revisit(d,"CV",cex=1.5)
    mtitle(c("CV of equilibrium activities of",expr),cex=1.5)
    # calculate affinities in logfO2-logaH2O space
    a <- affinity(O2=c(-90,-60),H2O=c(-20,0),iprotein=i)
    # calculate the equilibrium activities
    d <- diagram(a,do.plot=FALSE,mam=FALSE,logact=0)
    # show the coefficient of variation
    revisit(d,"CV",cex=1.5)
    mtitle(c("CV of equilibrium activities of",expr),cex=1.5)

Run the code above in your browser using DataLab