The function gsea can perform several different gene set enrichment analyses. The general procedure is to obtain
single marker statistics (e.g. summary statistics), from which it is possible to compute and evaluate a test statistic
for a set of genetic markers that measures a joint degree of association between the marker set and the phenotype.
The marker set is defined by a genomic feature such as genes, biological pathways, gene interactions,
gene expression profiles etc.
Currently, four types of gene set enrichment analyses can be conducted with gsea; sum-based, count-based,
score-based, and our own developed method, the covariance association test (CVAT). For details and comparisons of
test statistics consult doi:10.1534/genetics.116.189498.
The sum test is based on the sum of all marker summary statistics located within the feature set. The single marker
summary statistics can be obtained from linear model analyses (from PLINK or using the qgg lma approximation),
or from single or multiple component REML analyses (GBLUP or GFBLUP) from the greml function. The sum test is powerful
if the genomic feature harbors many genetic markers that have small to moderate effects.
The count-based method is based on counting the number of markers within a genomic feature that show association
(or have single marker p-value below a certain threshold) with the phenotype. Under the null hypothesis (that the
associated markers are picked at random from the total number of markers, thus, no enrichment of markers in any
genomic feature) it is assumed that the observed count statistic is a realization from a hypergeometric distribution.
The score-based approach is based on the product between the scaled genotypes in a genomic feature and the residuals
from the liner mixed model (obtained from greml).
The covariance association test (CVAT) is derived from the fit object from greml (GBLUP or GFBLUP), and measures
the covariance between the total genomic effects for all markers and the genomic effects of the markers within the
genomic feature.
The distribution of the test statistics obtained from the sum-based, score-based and CVAT is unknown, therefore
a circular permutation approach is used to obtain an empirical distribution of test statistics.