util.formula: Functions to Work with Chemical Formulas

Description

Calculate the standard molal entropy of elements in a compound; calculate the standard molal Gibbs energy or enthalpy of formation, or standard molal entropy, from the other two; list coefficients of selected elements in a chemical formula; calculate the average oxidation number of carbon. Also, create a matrix having the chemical formulas of amino acid residues in proteins and calculate the chemical formulas of proteins from their amino acid composition.

Usage

GHS(species = NULL, DG = NA, DH = NA, S = NA, T = thermo$opt$Tr)
  element(compound, property = c("mass","entropy"))
  expand.formula(elements, makeup)
  ZC(x)
  residue.formula()
  protein.formula(proteins, as.residue = FALSE)

Arguments

species

character, formula of a compound from which to calculate entropies of the elements.

numeric, standard molal Gibbs energy of formation.

numeric, standard molal enthalpy of formation.

numeric, standard molal molal entropy.

numeric, temperature in Kelvin.

compound

character, name of element(s) or compound(s).

property

character, name(s) of thermodynamic properties.

elements

character, name(s) of elements.

makeup

dataframe, elemental composition of a compound returned by makeup.

character, object representing chemical formula.

proteins

dataframe, amino acid composition of one or more proteins in the same format as thermo$protein

as.residue

logical, return the per-residue formula of the protein(s)?

Value

GHS and ZC return numeric values. expand.formula returns a numeric vector.

Details

GHS computes one of the standard molal Gibbs energy or enthalpy of formation from the elements (DG, DH) or entropy (S) at 298.15 K and 1 bar from values of the other two. If the species argument is present, it is used to calculate the entropies of the elements (Se) using element, otherwise Se is set to zero. The equation in effect can be written as ${\Delta}G^{\circ}={\Delta}H^{\circ}-T{\Delta}S^{\circ}$, where ${\Delta}S^{\circ}=S-S_e$ and $T$ denotes the reference temperature of 298.15 K. If two of DG, DH, and S are provided, the value of the third is returned. If three are provided, the value of DG in the arguments is ignored and the calculated value of DG is returned. If none of DG, DH or S are provided, the value of Se is returned. If only one of the values is provided, an error results. Units of cal mol$^{-1}$ (DG, DH) and cal K$^{-1}$ mol$^{-1}$ (S) are assumed. It T is provided, it use used instead of the reference temperature.

element returns a dataframe of the mass and entropy of one or more elements or formulas given in compound. The property can be mass and/or entropy.

expand.formula converts a 1-column dataframe representing the elemental composition of a compound (see makeup) to a numeric vector, each value of which is the coefficient of the elements given in the argument. If any of these is not present in the makeup dataframe, its coefficient is set to zero. A non-zero coefficient of an element in the makeup dataframe does not appear in the output if that element is not one of elements.

ZC returns the nominal carbon oxidation state for the chemical formula represented by x. (For discussion of nominal carbon oxidation state, see Hendrickson et al., 1970; Buvet, 1983.) If carbon is not present in the formula the result is NaN.

protein.formula exists to quickly compute the chemical formulas of many proteins. The proteins argument contains the amino acid compositions of the proteins in the same format as the thermo$protein dataframe. residue.formula is called to calculate the chemical formulas of each of the 20 common amino acid residues (and the terminal H- and -OH). The amino acid compositions of the proteins and the output of residue.formula are multiplied using matrix multiplication to generate the result.

References

Buvet, R. (1983) General criteria for the fulfillment of redox reactions, in Bioelectrochemistry I: Biological Redox Reactions, Milazzo, G. and Blank, M., eds., Plenum Press, New York, p. 15--50. http://www.worldcat.org/oclc/9282370 Hendrickson, J. B., Cram, D. J., and Hammond, G. S. (1970) Organic Chemistry, 3rd ed., McGraw-Hill, New York, 1279 p. http://www.worldcat.org/oclc/78308

Examples

Run this code

data(thermo)
  ## converting among Gibbs, enthalpy, entropy
  GHS("H") # entropy of H (element)
  # calculate enthalpy of formation of arsenopyrite 
  GHS("FeAsS",DG=-33843,S=68.5) 
  # return the value of DG calculated from DH and S
  # cf. -56687.71 from subcrt("water")
  GHS("H2O",DH=-68316.76,S=16.7123)  

  ## mass and entropy of compounds of elements
  element("CH4")
  element(c("CH4","H2O"),"mass")
  element("Z")   # charge
  # same mass, opposite entropy as charge
  element("Z-1") # i.e., electron
 
  ## count selected elements in a formula
  m <- makeup("H2O")
  expand.formula(c("H","O"),m)
  expand.formula(c("C","H","S"),m)

  ## calculate the average chemical formula of all of 
  ## the proteins in CHNOSZ' database
  ## this is much faster than a for-loop
  pf <- protein.formula(thermo$protein)
  colSums(pf)/nrow(pf)

  ## nominal carbon oxidation states
  ZC("CO2")  # 4
  ZC("CH4")  # -4
  ZC("CHNOSZ") # 7
  si <- info(info("LYSC_CHICK"))
  ZC(si$formula)  # 0.01631

  ## plot ZC of reference protein sequence
  ## for different organisms
  file <- system.file("extdata/refseq/protein_refseq.csv.xz",package="CHNOSZ")
  ip <- add.protein(file)
  # only use those organisms with a certain
  # number of sequenced bases
  ip <- ip[as.numeric(thermo$protein$abbrv[ip])>100000]
  pf <- protein.formula(thermo$protein[ip,])
  zc <- ZC(pf)
  # the organism names we search for
  # "" matches all organisms
  terms <- c("Streptomyces","Pseudomonas","Salmonella",
    "Escherichia","Vibrio","Bacteroides","Lactobacillus",
    "Staphylococcus","Streptococcus","Methano","Bacillus","Thermo","")
  tps <- thermo$protein$ref[ip]
  plot(0,0,xlim=c(1,13),ylim=c(-0.3,-0.05),pch="",
    ylab="average oxidation state of carbon in proteins",
    xlab="",xaxt="n",mar=c(6,3,1,1))
  for(i in 1:length(terms)) {
    it <- grep(terms[i],tps)
    zct <- zc[it]
    points(jitter(rep(i,length(zct))),zct,pch=20)
  }
  terms[13] <- paste("all organisms")
  axis(1,1:13,terms,las=2)
  title(main=paste("Average Oxidation State of Carbon:",
    "Total Protein per taxID in NCBI RefSeq",sep=""))

Run the code above in your browser using DataLab