geneSetTest(index, statistics, alternative = "mixed", type= "auto", ranks.only = TRUE, nsim=9999)
wilcoxGST(index, statistics, ...)
statistics
or, in general, any vector such that statistic[index]
gives the statistic values for the gene set to be tested."mixed"
, "either"
, "up"
or "down"
. "two.sided"
, "greater"
and "less"
are also permitted as synonyms for "either"
, "up"
and "down"
respectively."t"
) or unsigned (F-like, "f"
) or whether the function should make an educated guess ("auto"
).
If the statistic is unsigned, then it assume that larger statistics are more significant.TRUE
only the ranks of the statistics
are used.ranks.only=TRUE
.geneSetTest
.wilcoxGST
is a synonym for geneSetTest
with ranks.only=TRUE
.
This version of the test procedure was developed by Michaud et al (2008), who called it mean-rank gene-set enrichment.
geneSetTest
performs a competitive test in the sense that genes in the test set are compared to other genes (Goeman and Buhlmann, 2007).
If the statistic
is a genewise test statistic for differential expression,
then geneSetTest
tests whether genes in the set are more differentially expressed than genes not in the set.
By contrast, a self-contained gene set test such as roast
tests whether genes in the test set are differentially expressed, in an absolute sense, without regard to any other genes on the array.
Because it is based on permuting genes, geneSetTest
assumes that the different genes (or probes) are statistically independent.
(Strictly speaking, it assumes that the genes in the set are no more correlated on average than randomly chosen genes.)
If inter-gene correlations are present, then a statistically significant result from geneSetTest
indicates either that the set is highly ranked or that the genes in the set are positively correlated on average (Wu and Smyth, 2012).
Unless gene sets with positive correlations are particularly of interest, it may be advisable to use camera
instead to adjust the test for inter-gene correlations.
Inter-gene correlations are likely to be present in differential expression experiments with biologically heterogeneous experimental units.
On the other hand, the assumption of independence between genes should hold when the replicates are purely technical, i.e., when there is no biological variability between the replicate arrays in each experimental condition.
The statistics
are usually a set of probe-wise statistics arising for some comparison from a microarray experiment.
They may be t-statistics, meaning that the genewise null hypotheses would be rejected for large positive or negative values, or they may be F-statistics, meaning that only large values are significant.
Any set of signed statistics, such as log-ratios, M-values or moderated t-statistics, are treated as t-like.
Any set of unsigned statistics, such as F-statistics, posterior probabilities or chi-square tests are treated as F-like.
If type="auto"
then the statistics will be taken to be t-like if they take both positive and negative values and will be taken to be F-like if they are all of the same sign.
There are four possible alternatives to test for.
alternative=="up"
means the genes in the set tend to be up-regulated, with positive t-statistics.
alternative=="down"
means the genes in the set tend to be down-regulated, with negative t-statistics.
alternative=="either"
means the set is either up or down-regulated as a whole.
alternative=="mixed"
test whether the genes in the set tend to be differentially expressed, without regard for direction.
In this case, the test will be significant if the set contains mostly large test statistics, even if some are positive and some are negative.
The latter three alternatives are appropriate if you have a prior expection that all the genes in the set will react in the same direction.
The "mixed"
alternative is appropriate if you know only that the genes are involved in the relevant pathways, possibly in different directions.
The "mixed"
is the only meaningful alternative with F-like statistics.
The test statistic used for the gene-set-test is the mean of the statistics in the set.
If ranks.only
is TRUE
the only the ranks of the statistics are used.
In this case the p-value is obtained from a Wilcoxon test.
If ranks.only
is FALSE
, then the p-value is obtained by simulation using nsim
random sets of genes.
Goeman, JJ, and Buhlmann P (2007). Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics 23, 980-987.
Michaud, J, Simpson, KM, Escher, R, Buchet-Poyau, K, Beissbarth, T, Carmichael, C, Ritchie, ME, Schutz, F, Cannon, P, Liu, M, Shen, X, Ito, Y, Raskind, WH, Horwitz, MS, Osato, M, Turner, DR, Speed, TP, Kavallaris, M, Smyth, GK, and Scott, HS (2008). Integrative analysis of RUNX1 downstream pathways and target genes. BMC Genomics 9, 363. http://www.biomedcentral.com/1471-2164/9/363
camera
, roast
, romer
, wilcox.test
, barcodeplot
There is a topic page on 10.GeneSetTests.
stat <- rnorm(100)
sel <- 1:10; stat[sel] <- stat[sel]+1
wilcoxGST(sel,stat)
Run the code above in your browser using DataLab