pdat_: Get Protein Expression Data

Description

Get data for protein expression and chemical composition.

Usage

pdat_breast(dataset = 2020, basis = "rQEC")
  pdat_colorectal(dataset = 2020, basis = "rQEC")
  pdat_liver(dataset = 2020, basis = "rQEC")
  pdat_lung(dataset = 2020, basis = "rQEC")
  pdat_pancreatic(dataset = 2020, basis = "rQEC")
  pdat_prostate(dataset = 2020, basis = "rQEC")
  pdat_hypoxia(dataset = 2020, basis = "rQEC")
  pdat_secreted(dataset = 2020, basis = "rQEC")
  pdat_3D(dataset = 2020, basis = "rQEC")
  pdat_glucose(dataset = 2020, basis = "rQEC")
  pdat_osmotic_bact(dataset = 2020, basis = "rQEC")
  pdat_osmotic_euk(dataset = 2020, basis = "rQEC")
  pdat_osmotic_halo(dataset = 2020, basis = "rQEC")
  .pdat_multi(dataset = 2020, basis = "rQEC")
  .pdat_osmotic(dataset = 2017, basis = "rQEC")

Arguments

dataset

character, dataset name

basis

character, keyword for basis species to use

Value

A list consisting of:

dataset: the name of the dataset
basis: basis species used for the calculations
description: descriptive text for the dataset
pcomp: compositional data generated by protcomp
up2: logical vector with length equal to the number of proteins; TRUE for up-regulated protein and FALSE for down-regulated proteins

Details

The pdat_ functions assemble lists of up- and down-regulated proteins and calculate chemical compositions using protcomp. After this, use get_comptab to make a table of compositional metrics that can be plotted with diffplot.

If dataset is 2020 (the default) or 2017, the function returns the names of all datasets in the compilation for the respective year.

Each dataset name starts with a reference key indicating the study (publication) where the data were reported. The reference keys are made by combining the first characters of the authors' family names with the 2-digit year of publication. For mutiple datasets from one study, the reference key is followed by an underscore and descriptive text for the particular dataset.

Provide one of the dataset names as the dataset argument to retrieve the data. The functions get protein expression data from the CSV files stored in extdata/expression/, under the subdirectory corresponding to the name of the pdat_ function. Some of the functions also read amino acid compositions (for non-human proteins) from the files in extdata/aa/.

Descriptions for each function:

pdat_colorectal, pdat_pancreatic, pdat_breast, pdat_lung, pdat_prostate, and pdat_liver retrieve data for protein expression in different cancer types.
pdat_hypoxia gets data for cellular extracts in hypoxia and pdat_secreted gets data for secreted proteins (e.g. exosomes) in hypoxia.
pdat_3D retrieves data for 3D (e.g. tumor spheroids and aggregates) compared to 2D (monolayer) cell culture.
.pdat_osmotic retrieves data for hyperosmotic stress, for the 2017 compilation only. In 2020, this compilation was expanded and split into pdat_osmotic_bact (bacteria), pdat_osmotic_euk (eukaryotic cells) and pdat_osmotic_halo (halophilic bacteria and archaea).
pdat_glucose gets data for high-glucose experiments in eukaryotic cells.
.pdat_multi retrieves data for studies that have multiple types of datasets (e.g. both cellular and secreted proteins in hypoxia), and is used internally by the specific functions (e.g. pdat_hypoxia and pdat_secreted).

Examples

Run this code

# NOT RUN {
library(CHNOSZ)
# list datasets in the 2017 complilation for colorectal cancer
pdat_colorectal(2017)
# process one dataset
pdat_colorectal("JKMF10")
# }

Run the code above in your browser using DataLab