The function TFA.estimate
estimates the transcription factor activities from gene
expression data and ChIP data using the PLS multivariate regression approach described
in Boulesteix and Strimmer (2005).
TFA.estimate(CONNECdata, GEdata, ncomp=NULL, nruncv=0, alpha=2/3, unit.weights=TRUE)
A list with the following components:
a (p x m) matrix containing the estimated transcription factor activities for the p transcription factors and the m samples.
a (m x ncomp
) matrix containing the metafactors for the m
samples. Each row corresponds to a sample, each column to a metafactor.
the number of latent components used in the PLS regression.
a (n x p) matrix containing the ChIP data for the n genes and the
p predictors. The n genes must be the same as the n genes of GEdata
and the ordering
of the genes must also be the same. Each row of ChIPdata
corresponds to a gene, each
column to a transcription factor. CONNECdata
might have either binary (e.g. 0-1) or
numeric entries.
a (n x m) matrix containing the gene expression levels of the n
considered genes for m samples. Each row of GEdata
corresponds to a gene, each
column to a sample.
if nruncv=0
, ncomp
is the number of latent
components to be constructed. If nruncv>0
, the number of
latent components to be used for PLS
regression is chosen from 1,...,ncomp
using the cross-validation procedure described
in Boulesteix and Strimmer (2005). If ncomp=NULL
, ncomp
is set to min(n,p).
the number of cross-validation iterations to be performed for the choice of
the number of latent components. If nruncv=0
, cross-validation is not performed and
ncomp
latent components are used.
the proportion of genes to be included in the training set for the cross-validation procedure.
If TRUE
then the latent components
will be constructed from weight vectors that are standardized to length 1,
otherwise the weight vectors do not have length 1 but the latent components have
norm 1.
Anne-Laure Boulesteix (https://www.ibe.med.uni-muenchen.de/mitarbeiter/professoren/boulesteix/index.html) and Korbinian Strimmer (https://strimmerlab.github.io/korbinian.html).
The gene expression data as well as the ChIP data are assumed to have been
properly normalized. However, they do not have to be centered or scaled, since
centering and scaling are performed by the function TFA.estimate
as a
preliminary step.
The matrix ChIPdata
containing the ChIP data for the n genes and p transcription
factors might be replaced by any 'connectivity' matrix whose entries give the strength
of the interactions between the genes and transcription factors. For instance, a connectivity
matrix obtained by aggregating qualitative information from various genomic databases
might be used as argument in place of ChIP data.
A. L. Boulesteix and K. Strimmer (2005). Predicting Transcription Factor Activities from Combined Analysis of Microarray and ChIP Data: A Partial Least Squares Approach.
A. L. Boulesteix, K. Strimmer (2007). Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Briefings in Bioinformatics 7:32-44.
S. de Jong (1993). SIMPLS: an alternative approach to partial least squares regression, Chemometrics Intell. Lab. Syst. 18, 251--263.
pls.regression
, pls.regression.cv
.
# load plsgenomics library
library(plsgenomics)
# load Ecoli data
data(Ecoli)
# estimate TFAs based on 3 latent components
TFA.estimate(Ecoli$CONNECdata,Ecoli$GEdata,ncomp=3,nruncv=0)
# estimate TFAs and determine the best number of latent components simultaneously
TFA.estimate(Ecoli$CONNECdata,Ecoli$GEdata,ncomp=1:5,nruncv=20)
Run the code above in your browser using DataLab