runGSA(geneLevelStats, directions=NULL, geneSetStat="mean", signifMethod="geneSampling", adjMethod="fdr", gsc, gsSizeLim=c(1,Inf), permStats=NULL, permDirections=NULL, nPerm=1e4, gseaParam=1, ncpus=1, verbose=TRUE)
"fisher"
, "stouffer"
, "reporter"
, "tailStrength"
, "wilcoxon"
, "mean"
, "median"
, "sum"
, "maxmean"
, "gsea"
or "page"
. See below for details.
"geneSampling"
, "samplePermutation"
or "nullDist"
p.adjust
, i.e. "holm"
, "hochberg"
, "hommel"
, "bonferroni"
, "BH"
, "BY"
, "fdr"
or "none"
. The exception is for geneSetStat="gsea"
, where only the options "fdr"
and "none"
can be used.
GSC
as returned by the loadGSC
function.
c(1,Inf)
.
signifMethod="samplePermutation"
.
permStats
, but should instead contain fold-change like values for the related permutated statistics. This is mainly used if the statistics are p-values or F-values, but not required. The values should be positive or negative, but only the sign information will be used, so the actual value will not matter. This argument is only used if signifMethod="samplePermutation"
, but not required. Note however, that if directions
is give then also permDirections
is required, and vice versa.
signifMethod="geneSampling"
. The original Reporter features algorithm (geneSetStat="reporter"
and signifMethod="nullDist"
) also uses a permutation step which is controlled by nPerm
.
nPerm/ncpus
is a positive integer.
GSAres
containing the following elements:geneLevelStats
and directions
should be identical and match the names of the members of the gene sets in gsc
. If geneSetStat
is set to "fisher"
, "stouffer"
, "reporter"
or "tailStrength"
only p-values are allowed as geneLevelStats
. If geneSetStat
is set to "maxmean"
, "gsea"
or "page"
only t-like geneLevelStats
are allowed (e.g. t-values, fold-changes).
For geneSetStat
set to "fisher"
, "stouffer"
, "reporter"
, "wilcoxon"
or "page"
, the gene set p-values can be calculated from a theoretical null-distribution, in this case, set signifMethod="nullDist"
. For all methods signifMethod="geneSampling"
or signifMethod="samplePermutation"
can be used. If signifMethod="geneSampling"
gene sampling is used, meaning that the gene labels are randomized nPerm
times and the gene set statistics are recalculated so that a background distribution for each original gene set is acquired. The gene set p-values are calculated based on this background distribution. Similarly if signifMethod="samplePermutation"
sample permutation is used. In this case the argument permStats
(and optionally permDirections
) has to be supplied.
The runGSA
function returns p-values for each gene set. Depending on the choice of methods and gene statistics up to three classes of p-values can be calculated, describing different aspects of regulation directionality. The three directionality classes are Distinct-directional, Mixed-directional and Non-directional. The non-directional p-values (pNonDirectional
) are calculated based on absolute values of the gene statistics (or p-values without sign information), meaning that gene sets containing a high portion of significant genes, independent of direction, will turn up significant. That is, gene-sets with a low pNonDirectional
should be interpreted to be significantly affected by gene regulation, but there can be a mix of both up and down regulation involved. The mixed-directional p-values (pMixedDirUp
and pMixedDirDn
) are calculated using the subset of the gene statistics that are up-regulated and down-regulated, respectively. This means that a gene set with a low pMixedDirUp
will have a component of significantly up-regulated genes, disregardful of the extent of down-regulated genes, and the reverse for pMixedDirDn
. This also means that one can get gene sets that are both significantly affected by down-regulation and significantly affected by up-regulation at the same time. Note that sample permutation cannot be used to calculate pMixedDirUp
and pMixedDirDn
since the subset sizes will differ. Finally, the distinct-directional p-values (pDistinctDirup
and pDistinctDirDn
) are calculated from statistics with sign information (e.g. t-statistics). In this case, if a gene set contains both up- and down-regulated genes, they will cancel out each other. A gene-set with a low pDistinctDirUp
will be significantly affected by up-regulation, but not a mix of up- and down-regulation (as in the case of the mixed-directional and non-directional p-values). In order to be able to calculate distinct-directional gene set p-values while using p-values as gene-level statistics, the gene-level p-values are transformed as follows: The up-regulated portion of the p-values are divided by 2 (scaled to range between 0-0.5) and the down-regulated portion of p-values are set to 1-p/2 (scaled to range between 1-0.5). This means that a significantly down-regulated gene will get a p-value close to 1. These new p-values are used as input to the gene-set analysis procedure to get pDistinctDirUp
. Similarly, the opposite is done, so that the up-regulated portion is scaled between 1-0.5 and the down-regulated between 0-0.5 to get the pDistinctDirDn
.
Stouffer, S., Suchman, E., Devinney, L., Star, S., and Williams Jr, R. The American soldier: adjustment during army life. Princeton University Press, Oxford, England, (1949).
Patil, K. and Nielsen, J. Uncovering transcriptional regulation of metabolism by using metabolic network topology. Proceedings of the National Academy of Sciences of the United States of America 102(8), 2685 (2005).
Oliveira, A., Patil, K., and Nielsen, J. Architecture of transcriptional regulatory circuits is knitted over the topology of bio-molecular interaction networks. BMC Systems Biology 2(1), 17 (2008).
Kim, S. and Volsky, D. Page: parametric analysis of gene set enrichment. BMC bioinformatics 6(1), 144 (2005).
Taylor, J. and Tibshirani, R. A tail strength measure for assessing the overall univariate significance in a dataset. Biostatistics 7(2), 167-181 (2006).
Mootha, V., Lindgren, C., Eriksson, K., Subramanian, A., Sihag, S., Lehar, J., Puigserver, P., Carlsson, E., Ridderstrale, M., Laurila, E., et al. Pgc-1-alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature genetics 34(3), 267-273 (2003).
Subramanian, A., Tamayo, P., Mootha, V., Mukherjee, S., Ebert, B., Gillette, M., Paulovich, A., Pomeroy, S., Golub, T., Lander, E., et al. Gene set enrichment analysis: a knowledgebased approach for interpreting genom-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102(43), 15545-15550 (2005).
Efron, B. and Tibshirani, R. On testing the significance of sets of genes. The Annals of Applied Statistics 1, 107-129 (2007).
loadGSC
, GSAsummaryTable
, geneSetSummary
,
networkPlot
, HTSanalyzeR-package, PGSEA, samr, limma, GSA
# Load example input data to GSA:
data("gsa_input")
# Load gene set collection:
gsc <- loadGSC(gsa_input$gsc)
# Run gene set analysis:
gsares <- runGSA(geneLevelStats=gsa_input$pvals , directions=gsa_input$directions,
gsc=gsc, nPerm=500)
Run the code above in your browser using DataLab