targetG: Computation of target G ('knowledge-based constant correlation model').

Description

The $p x p$ target G is computed from the $n x p$ data matrix. It is defined as follows ($i,j = 1,...,p$): $$t_{ij}=\left\{ \begin {array} {ll} s_{ii}\;&\mbox{if}\;i=j\\ \bar{r}\sqrt{s_{ii}s_{jj}}\;&\mbox{if}\;i\neq j, i\sim j\\ 0\;&\mbox{otherwise} \end{array} \right.$$ where $r$ is the average of sample correlations and $sij$ denotes the entry of the unbiased covariance matrix in row $i$, column $j$. The notation $i ~ j$ means that genes $i$ and $j$ are connected, i.e. genes $i$ and $j$ are in the same gene functional group.

Usage

targetG(x, genegroups)

Arguments

A $n x p$ data matrix.

genegroups

A list of genes obtained using the database KEGG, where each entry itself is a list of pathway names this genes belongs to. If a gene does not belong to any gene functional group, the entry is NA.

Value

A $p x p$ matrix.

References

J. Schaefer and K. Strimmer, 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.
M. Jelizarow, V. Guillemot, A. Tenenhaus, K. Strimmer, A.-L. Boulesteix, 2010. Over-optimism in bioinformatics: an illustration. Bioinformatics. Accepted.

Examples

Run this code

# A short example on a toy dataset
# require(SHIP)
data(expl)
attach(expl)
tar <- targetG(x,genegroups)
which(tar[upper.tri(tar)]!=0) # not many non zero coefficients !

Run the code above in your browser using DataLab