proportionsInAdmixture: Estimate the proportion of pure populations in an admixed population based on marker expression values.

Description

Assume that datE.Admixture provides the expression values from a mixture of cell types (admixed population) and you want to estimate the proportion of each pure cell type in the mixed samples (rows of datE.Admixture). The function allows you to do this as long as you provide a data frame MarkerMeansPure that reports the mean expression values of markers in each of the pure cell types.

Usage

proportionsInAdmixture(
  MarkerMeansPure, 
  datE.Admixture, 
  calculateConditionNumber = FALSE, 
  coefToProportion = TRUE)

Arguments

MarkerMeansPure

is a data frame whose first column reports the name of the marker and the remaining columns report the mean values of the markers in each of the pure populations. The function will estimate the proportion of pure cells which correspond to columns 2 thro

datE.Admixture

is a data frame of expression data, e.g. the columns of datE.Admixture could correspond to thousands of genes. The rows of datE.Admixture correspond to the admixed samples for which the function estimates the proportions of pu

calculateConditionNumber

logical. Default is FALSE. If set to TRUE then it uses the kappa function to calculates the condition number of the matrix MarkerMeansPure[,-1]. This allows one to determine whether the linear model for estimating the proport

coefToProportion

logical. By default, it is set to TRUE. When estimating the proportions the function fits a multivariate linear model. Ideally, the coefficients of the linear model correspond to the proportions in the admixed samples. But sometimes the coefficients tak

Value

A list with the following components
PredictedProportionsdata frame that contains the predicted proportions. The rows of PredictedProportions correspond to the admixed samples, i.e. the rows of datE.Admixture. The columns of PredictedProportions correspond to the pure populations, i.e. the columns of MarkerMeansPure[,-1].
datCoef=datCoefdata frame of numbers that is analogous to PredictedProportions. In general, datCoef will only be different from PredictedProportions if coefToProportion=TRUE. See the description of coefToProportion
conditionNumberThis is the condition number resulting from the kappa function. See the description of calculateConditionNumber.
markersUsedvector of character strings that contains the subset of marker names (specified in the first column of MarkerMeansPure) that match column names of datE.Admixture and that contain non-missing pure mean values.

Details

The methods implemented in this function were motivated by the gene expression deconvolution approach described by Abbas et al (2009), Lu et al (2003), Wang et al (2006). This approach can be used to predict the proportions of (pure) cells in a complex tissue, e.g. the proportion of blood cell types in whole blood. To define the markers, you may need to have expression data from pure populations. Then you can define markers based on a significant t-test or ANOVA across the pure populations. Next use the pure population data to estimate corresponding mean expression values. Hopefully, the array platforms and normalization methods for datE.MarkersAdmixtureTranspose and MarkerMeansPure are comparable. When dealing with Affymetrix data: we have successfully used it on untransformed MAS5 data. For statisticians: To estimate the proportions, we use the coefficients of a linear model. Specifically: datCoef= t(lm(datE.MarkersAdmixtureTranspose ~MarkerMeansPure[,-1])$coefficients[-1,]) where datCoef is a matrix whose rows correspond to the mixed samples (rows of datE.Admixture) and the columns correspond to pure populations (e.g. cell types), i.e. the columns of MarkerMeansPure[,-1]. More details can be found in Abbas et al (2009).

References

Abbas AR, Wolslegel K, Seshasayee D, Modrusan Z, Clark HF (2009) Deconvolution of Blood Microarray Data Identifies Cellular Activation Patterns in Systemic Lupus Erythematosus. PLoS ONE 4(7): e6098. doi:10.1371/journal.pone.0006098 Lu P, Nakorchevskiy A, Marcotte EM (2003) Expression deconvolution: a reinterpretation of DNA microarray data reveals dynamic changes in cell populations. Proc Natl Acad Sci U S A 100: 10370-10375. Wang M, Master SR, Chodosh LA (2006) Computational expression deconvolution in a complex mammalian organ. BMC Bioinformatics 7: 328.

Description

Usage

Arguments

Value

Details

References

See Also