deds.stat: Differential Expression via Distance Summary of Multiple Statistics

Description

deds.stat integrates different statistics of differential expression (DE) to rank and select a set of DE genes.

Usage

deds.stat(X, L, B = 1000, testfun = list(t = comp.t(L), fc = comp.FC(L),
sam = comp.SAM(L)), tail = c("abs", "lower", "higher"), distance =
c("weuclid", "euclid"), adj = c("fdr", "adjp"), nsig = nrow(X))

Arguments

A matrix, with $m$ rows corresponding to variables (hypotheses) and $n$ columns corresponding to observations. In the case of gene expression data, rows correspond to genes and columns to mRNA samples. The data can be read using read.table.

A vector of integers corresponding to observation (column) class labels. For $k$ classes, the labels must be integers between 0 and $k-1$.

The number of permutations. For a complete enumeration, B should be 0 (zero) or any number not less than the total number of permutations.

testfun

A list of functions specifying the statistics to be used to test the null hypothesis of no association between the variables and the class labels. The default uses t, fold change and SAM. The input can also be generated using the function deds.chooseTest.

tail

A character string specifying the type of rejection region. If side="abs", two-tailed tests, the null hypothesis is rejected for large absolute values of the test statistic. If side="higher", one-tailed tests, the null hypothesis is rejected for large values of the test statistic. If side="lower", one-tailed tests, the null hypothesis is rejected for small values of the test statistic.

distance

A character string specifying the type of distance measure used for the calculation of the distance to the extreme point (E). If distance="weuclid", weighted euclidean distance, the weight for statistic $t$ is $1/MAD(t)$; If distance="euclid", euclidean distance.

adj

A character string specifying the type of multiple testing adjustment. If adj="fdr", False Discovery Rate is controled and $q$ values are returned. If adj="adjp", ajusted $p$ values that controls family wise type I error rate is returned.

nsig

If adj = "fdr", nsig specifies the number of top differentially expressed genes whose $q$ values will be calculated; we recommend setting nsig < m, as the computation of $q$ values will be extensive. $q$ values for the rest of genes will be approximated to 1. If adj = "adjp", the calculation of the adjusted $p$ values will be for the whole dataset.

Value

An object of class DEDS. See DEDS-class.

Details

deds.stat summarizes multiple statistical measures for the evidence of DE. The DEDS methodology treats each gene as a point corresponding to a gene's vector of DE measures. An "extreme origin" is defined as the maxima of all statistics and the distance from all points to the extreme is computed and ranking of a gene for DE is determined by the closeness of the gene to the extreme. To determine a cutoff for declaration of DE, null referent distributions are generated by permuting the data matrix.

Statistical measures currently in the DEDS package include t statistics (comp.t), fold changes(comp.FC), F statistics (comp.F), SAM ((comp.SAM), moderated t (comp.modt), moderated F statistics (comp.modF), and B statistics (comp.B). The user can also supply their own function for a statistic other than the above, provided the function is written in a similar format as the above ones.

The function deds.stat could be slow if the size of the data matrix and the number of permutations are big. We hence recommend the user to use deds.stat.linkC as the default function. deds.stat.linkC interfaces to a C function, which handles a 10,000 by 10 matrix and 1000 permutations in minutes.

DEDS can also summarize $p$ values from different statistical models, see deds.pval.

References

Yang, Y. H., Xiao, Y. and Segal MR: Selecting differentially expressed genes from microarray experiment by sets of statistics. Bioinformatics, 2004, accepted. http://www.biostat.ucsf.edu/jean/Papers/DEDS.pdf.

Examples

Run this code

X <- matrix(rnorm(1000,0,0.5), nc=10)
L <- rep(0:1,c(5,5))

# genes 1-10 are differentially expressed
X[1:10,6:10]<-X[1:10,6:10]+1

# DEDS summarizing t, sam and fc
deds.X <- deds.stat(X, L, B=200)

# DEDS summarizing t, tmod and fc
## Not run: deds.X <- deds.stat(X, L, testfun=list(t=comp.t(L),
# tmod=comp.modt(L), sam=comp.SAM(L)))## End(Not run)

# one can also use:
## Not run: deds.X <- deds.stat(X, L, testfun=deds.chooseTest(L,
# tests=c("t","modt","fc")))
# ## End(Not run)

Run the code above in your browser using DataLab