The traditional gene-set analysis (GSA) to find significant pathways
uses the whole expression matrix. GSA methods are implemented via either a univariate
or a multivariate procedure. In univariate analysis, node level statistics are
initially calculated from fold changes or statistical tests (e.g., t-test).
These statistics are then combined into a pathway level statistic by summation or
averaging. Multivariate analysis considers the correlations between genes in the
pathway and calculates the pathway level statistic directly from the expression
value matrix using Hotelling's T^2 test or MANOVA models. The function implement
univariate procedure of GSA with network centralities.
If users use the PID.db
data, all genes should be formatted in gene symbol.
If the centrality measurement is set as a string, only pre-defined "equal.weight",
"in.degree", "out.degree", "degree", "betweenness", "in.reach", "out.reach",
"reach", "in.spread", "out.spread" and "spread" are allowed. More centrality
measurements can be used by setting it as a function (such as closeness,
cluster coefficient). In the function, we recommand users choose
at least two centrality measurements. Note that the self-defined function should
only contain one argument which is an igraph object. The default centralities are "equal.weight",
"in.degree", "out.degree", "betweenness", "in.reach" and "out.reach".
The node level statistic can be self-defined. The self-defined function should contain
two arguments: a vector for expression value in treatment class and a vector for
expression value in control class.
The pathway level statistic can be self-defined. The self-defined function should
only contain one argument: the vector of node-level statistic.
However, in most circumstance, the function is called by cepa.all
.
We are sorry that only the univariate procedures in GSA are extended. We are still
trying to figure out the extension for the multivariate procedures in GSA.