Learn R Programming

CHNOSZ (version 0.9-7)

get.protein: Proteins from Model Organisms

Description

Calculate the amino acid compositions of one or more proteins from Escherichia coli or Saccharomyces cerevisiae.

Usage

get.protein(protein, organism, abundance = NULL, pname = NULL, 
    average = TRUE, digits = 1) 
  yeastgfp(location, exclusive = TRUE)

Arguments

protein
character, name of protein or stress response experiment.
organism
character, organism (ECO, SGD) or YeastGFP.
abundance
numeric, stoichiometry of proteins applied to sums of compositions.
pname
character, names of proteins.
average
logical, return an average composition of the proteins?
digits
numeric, number of decimal places to round the amino acid counts.
location
character, name of subcellular location (compartment).
exclusive
logical, report only proteins exclusively localized to a compartment?

Value

  • For get.protein, returns the amino acid composition(s) of the specified protein(s), or a single overall composition if abundance is not NULL. yeastgfp returns a list with elements yORF and abundance, unless location is NULL, when the function returns (invisible-y) the names of all locations.

Details

When protein contains one or more Ordered Locus Names (OLN) or Open Reading Frame names (ORF), get.protein retrieves the amino acid composition of the respective proteins in Escherichia coli or Saccharomyces cerevisiae (for organism equal to ECO or SGD, respectively). The calculation depends on presence of the objects thermo$ECO and thermo$SGD, which contain the amino acid compositions of proteins in these organisms. If protein is instead a name of one of the stress response experiments contained in thermo$stress, e.g. low.C or heat.up, the function returns the amino acid compositions of the corresponding proteins.

If the abundances of the proteins are given in abundance, the individual protein compositions are multiplied by these values then summed into an overall composition; the average is taken if average is TRUE; then the amino acid frequencies are rounded to the number of decimal places specified in digits. Unless names for the new proteins are given in pname, they are generated using the values in protein.

The yeastgfp function returns the identities and abundances of proteins with the requested subcellular localization (specified in location) using data from the YeastGFP project that is stored in extdata/abundance/yeastgfp.csv.xz. The default value of exclusive (FALSE) tells the function to grab all proteins that are localized to a compartment even if they are also localized to other compartments. If exclusive is TRUE, only those proteins that are localized exclusively to the requested compartments are identified, unless there are no such proteins, then the non-exclusive localizations are used (applies to the bud localization). The values returns by yeastgfp can be fed to get.protein in order to get the amino acid compositions of the proteins.

References

Boer, V. M., de Winde, J. H., Pronk, J. T. and Piper, M. D. W. (2003) The genome-wide transcriptional responses of Saccharomyces cerevisiae grown on glucose in aerobic chemostat cultures limited for carbon, nitrogen, phosphorus, or sulfur. J. Biol. Chem. 278, 3265--3274. http://dx.doi.org/10.1074/jbc.M209759200

Dick, J. M. (2009) Calculation of the relative metastabilities of proteins in subcellular compartments of Saccharomyces cerevisiae. BMC Syst. Biol. 3:75. http://dx.doi.org/10.1186/1752-0509-3-75

Richmond, C. S., Glasner, J. D., Mau, R., Jin, H. F. and Blattner, F. R. (1999) Genome-wide expression profiling in Escherichia coli K-12. Nucleic Acids Res. 27, 3821--3835. http://nar.oxfordjournals.org/cgi/content/abstract/27/19/3821

Tai, S. L., Boer, V. M., Daran-Lapujade, P., Walsh, M. C., de Winde, J. H., Daran, J.-M. and Pronk, J. T. (2005) Two-dimensional transcriptome analysis in chemostat cultures: Combinatorial effects of oxygen availability and macronutrient limitation in Saccharomyces cerevisiae. J. Biol. Chem. 280, 437--447. http://dx.doi.org/10.1074/jbc.M410573200

See Also

The output of get.protein can be used as input to add.protein to add the proteins to the thermo$protein data frame in preparation for further calculations (see examples below).

Examples

Run this code
data(thermo)
  ## basic examples of get.protein
  # amino acid composition of two proteins
  get.protein(c("YML020W","YBR051W"),"SGD")
  # average composition of proteins
  get.protein(c("YML020W","YBR051W"),"SGD",
    abundance=1,pname="PROT1_NEW")
  # 1 of one and 1/2 of the other
  get.protein(c("YML020W","YBR051W"),"SGD",
    abundance=c(1,0.5),average=FALSE,pname="PROT2_NEW")
  # compositions of proteins induced in carbon limitation 
  get.protein("low.C","SGD")

  ## overall composition of proteins exclusively localized 
  ## to cytoplasm of S. cerevisiae with reported expression levels
  y <- yeastgfp("cytoplasm")
  p <- get.protein(y$yORF,"SGD",y$abundance,"cytoplasm")
  # add the protein and calculate its properties
  i <- add.protein(p)
  protein(i)

  ## speciation diagram for ER.to.Golgi proteins (COPII coat 
  ## proteins) as a function of logfO2, after Dick, 2009
  y <- yeastgfp("ER.to.Golgi")
  # take out proteins with NA experimental abundance
  ina <- which(is.na(y$abundance))
  y$yORF <- y$yORF[-ina]
  y$abundance <- y$abundance[-ina]
  # get the amino acid compositions of the proteins
  p <- get.protein(y$yORF,"SGD")
  ip <- add.protein(p)
  # use logarithms of activities of proteins such
  # that total activity of residues is unity
  pl <- protein.length(-ip)
  logact <- unitize(rep(1,length(ip)),pl)
  # load the proteins
  basis("CHNOS+")
  a <- affinity(O2=c(-80,-73),iprotein=ip,loga.protein=logact)
  # make a speciation diagram
  diagram(a,ylim=c(-4.9,-2.9))
  # where we are closest to experimental log activity
  logfO2 <- rep(-78,length(ip))
  abline(v=logfO2[1],lty=3)
  # scale experimental abundances such that
  # total activity of residues is unity
  logact.expt <- unitize(log10(y$abundance),pl)
  # plot experimental log activity
  points(logfO2,logact.expt,pch=16)
  text(logfO2+0.5,logact.expt,y$yORF)
  # add title
  title(main=paste("ER.to.Golgi; points - relative abundances",
    "from YeastGFP. Figure after Dick, 2009",sep=""))
## Chemical activities of model subcellular proteins
  # speciation diagram as a function of logfO2, after Dick, 2009
  basis("CHNOS+")
  names <- yeastgfp()
  # calculate amino acid compositions using "get.protein" function 
  for(i in 1:length(names)) {
    y <- yeastgfp(names[i])
    p <- get.protein(y$yORF,"SGD",y$abundance,names[i])
    add.protein(p)
  }
  species(names,"SGD")
  # set unit activity of residues
  pl <- protein.length(thermo$species$name)
  species(NULL,unitize(thermo$species$logact,pl))
  res <- 200
  a <- affinity(O2=c(-82,-65,res))
  mycolor <- topo.colors(6)[1:4]
  mycolor <- rep(mycolor,times=rep(6,4))
  logact <- diagram(a,balance="PBB",names=names,ylim=c(-5,-3),legend.x=NULL,
    col=mycolor,lwd=2)$logact
  # so far good, but how about labels on the plot?
  for(i in 1:length(logact)) {
    myloga <- as.numeric(logact[[i]])
    # don't take values that lie above the plot (vacuole in this example)
    myloga[myloga > -3.1] <- -999
    imax <- which.max(myloga)
    adj <- 0.5
    if(imax > 180) adj <- 1
    if(imax < 20) adj <- 0
    text(seq(-82,-65,length.out=res)[imax],logact[[i]][imax],
      labels=names[i],adj=adj)
  }
  title(main=paste("Subcellular proteins of S. cerevisiae, after Dick, 2009",
    describe(thermo$basis[-5,]),sep="\n"),col.main=par("fg"),cex.main=0.9)

  ## Oxygen fugacity - activity of H2O predominance 
  ## diagrams for proteologs for 23 YeastGFP localizations
  # arranged by decreasing metastability:
  # order of this list of locations is based on the 
  # (dis)appearance of species on the current set of diagrams
  names <- c("vacuole","early.Golgi","ER","lipid.particle",
    "cell.periphery","ambiguous","Golgi","mitochondrion",
    "bud","actin","cytoplasm","late.Golgi",
    "endosome","nucleus","vacuolar.membrane","punctate.composite",
    "peroxisome","ER.to.Golgi","nucleolus","spindle.pole",
    "nuclear.periphery","bud.neck","microtubule")
  nloc <- c(4,5,3,4,4,3)
  inames <- 1:length(names)
  # define the system
  basis("CHNOS+")
  # calculate amino acid compositions using "get.protein" function 
  for(i in 1:length(names)) {
    y <- yeastgfp(names[i])
    p <- get.protein(y$yORF,"SGD",y$abundance,names[i])
    add.protein(p)
  }
  species(names,"SGD")
  a <- affinity(H2O=c(-5,0,256),O2=c(-80,-66,256))
  # setup the plot
  layout(matrix(c(1,1,2:7),byrow=TRUE,nrow=4),heights=c(0.7,3,3,3))
  par(mar=c(0,0,0,0))
  plot.new()
  text(0.5,0.5,paste("Subcellular proteins of S. cerevisiae,",
   "after Dick, 2009\n",describe(thermo$basis[-c(2,5),])),cex=1.5)
  opar <- par(mar=c(3,4,1,1),xpd=TRUE)
  for(i in 1:length(nloc)) {
    cex.axis <- 0.75
    # uncomment the following and dev.off() below to generate png files
    #png(paste(i,"png",sep="."),width=300,height=250); cex.axis <- 1
    diagram(a,balance="PBB",names=names[inames],
      ispecies=inames,cex.axis=cex.axis)
    label.plot(letters[i])
    title(main=paste(length(inames),"locations"))
    #dev.off()
    # take out the stable species
    inames <- inames[-(1:nloc[i])]
  }
  # make an animated gif from png files (with ImageMagick convert tool)
  #system(paste("convert -delay 100 1.png 1.png 1.png 2.png",
  #  "3.png 4.png 5.png 6.png 6.png 6.png yeast.gif"))
  # return to plot defaults
  layout(matrix(1))
  par(opar)

  ## Compare calculated and experimenal relative abundances
  ## of proteins in a subcellular location, after Dick, 2009
  # get the amino acid composition of the proteins
  loc <- "vacuolar.membrane"
  y <- yeastgfp(loc)
  ina <- which(is.na(y$abundance))
  p <- get.protein(y$yORF[-ina],"SGD")
  add.protein(p)
  # set up the system
  basis("CHNOS+")
  # this is the logfO2 value that gives the best fit (see paper)
  basis("O2",-74)
  is <- species(p$protein,p$organism)
  np <- length(is)
  pl <- protein.length(species()$name)
  # we use unitize so total activity of residues is unity
  loga <- rep(0,np)
  species(1:np,unitize(loga,pl))
  a <- affinity()
  d <- diagram(a,do.plot=FALSE)
  calc.loga <- as.numeric(d$logact)
  expt.loga <- unitize(log10(y$abundance[-ina]),pl)
  # which ones are outliers
  rmsd <- sqrt(sum((expt.loga-calc.loga)^2)/np)
  residuals <- abs(expt.loga - calc.loga)
  iout <- which(residuals > rmsd)
  pch <- rep(16,length(is))
  pch[iout] <- 1
  # the colors reflect average oxidation number of carbon
  # corrects misassigned colors in Figs. 5 and 6 of Dick 2009
  ZC <- ZC(thermo$obigt$formula[species()$ispecies])
  col <- rgb(0.15-ZC,0,0.35+ZC,max=0.5)
  # there is a color-plotting error on line 567 of the plot.R file 
  # of Dick, 2009 that can be reproduced with
  #col <- rep(col,length.out=9)
  xlim <- ylim <- extendrange(c(calc.loga,expt.loga))
  thermo.plot.new(xlim=xlim,ylim=ylim,xlab=expression(list("log"*italic(a),
    "calc")),ylab=expression(list("log"*italic(a),"expt")))
  points(calc.loga,expt.loga,pch=pch,col=col)
  lines(xlim,ylim+rmsd,lty=2)
  lines(xlim,ylim-rmsd,lty=2)
  title(main=paste("Calculated and experimental relative abundances of\n",
    "proteins in ",loc,", after Dick, 2009",sep=""),cex.main=0.95)
  
  ### examples for stress response experiments

  ## predominance fields for overall protein compositions induced by
  ## carbon, sulfur and nitrogen limitation
  ## (experimental data from Boer et al., 2003)
  expt <- c("low.C","low.N","low.S")
  for(i in 1:length(expt)) {
    p <- get.protein(expt[i],"SGD",abundance=1)
    add.protein(p)
  }
  basis("CHNOS+") 
  basis("O2",-75.29)
  species(expt,"SGD")
  a <- affinity(CO2=c(-5,0),H2S=c(-10,0))
  diagram(a,balance="PBB",names=expt,color=NULL)
  title(main=paste("Proteins induced by",
    "carbon, sulfur and nitrogen limitation",sep="\n"))

  ## predominance fields for overall protein compositions 
  ## induced and repressed in an/aerobic carbon limitation
  ## (experiments of Tai et al., 2005)
  # the activities of glucose, ammonium and sulfate
  # are similar to the non-growth-limiting concentrations
  # used by Boer et al., 2003
  basis(c("glucose","H2O","NH4+","hydrogen","SO4-2","H+"),
    c(-1,0,-1.3,999,-1.4,-7))
  # the names of the experiments in thermo$stress
  expt <- c("Clim.aerobic.down","Clim.aerobic.up",
    "Clim.anaerobic.down","Clim.anaerobic.up")
  # here we use abundance to indicate that the protein
  # compositions should be summed together in equal amounts
  for(i in 1:length(expt)) {
    p <- get.protein(expt[i],"SGD",abundance=1)
    add.protein(p)
  }
  species(expt,"SGD")
  a <- affinity(C6H12O6=c(-35,-20),H2=c(-20,0))
  diagram(a,color=NULL,as.residue=TRUE)
  title(main=paste("Average protein residue composition in",
    "an/aerobic carbon limitation in yeast",sep="\n"))

Run the code above in your browser using DataLab