This is a meta-analysis complement to functions standardScreeningBinaryTrait
and
standardScreeningNumericTrait
. Given expression (or other) data from multiple independent
data sets, and the corresponding clinical traits or outcomes, the function calculates multiple screening
statistics in each data set, then calculates meta-analysis Z scores, p-values, and optionally q-values
(False Discovery Rates). Three different ways of calculating the meta-analysis Z scores are provided: the
Stouffer method, weighted Stouffer method, and using user-specified weights.
metaAnalysis(multiExpr, multiTrait,
binary = NULL,
metaAnalysisWeights = NULL,
corFnc = cor, corOptions = list(use = "p"),
getQvalues = FALSE,
getAreaUnderROC = FALSE,
useRankPvalue = TRUE,
rankPvalueOptions = list(),
setNames = NULL,
kruskalTest = FALSE, var.equal = FALSE,
metaKruskal = kruskalTest, na.action = "na.exclude")
Data frame with the following components:
Identifier of the input genes (or other variables)
Meta-analysis Z statistics obtained using Stouffer's method with equal weights
p-values corresponding to Z.Stouffer.equalWeights
q-values corresponding to p.Stouffer.equalWeights
, only present if
getQvalues
is TRUE
.
Meta-analysis Z statistics obtained using Stouffer's method with weights given by the square root of the number of (non-missing) samples in each data set
p-values corresponding to Z.DoFWeights
q-values corresponding to p.DoFWeights
, only present if
getQvalues
is TRUE
.
Meta-analysis Z statistics obtained using Stouffer's method with weights given by the number of (non-missing) samples in each data set
p-values corresponding to Z.DoFWeights
q-values corresponding to p.DoFWeights
, only present if
getQvalues
is TRUE
.
Meta-analysis Z statistics
obtained using Stouffer's method with user-defined weights. Only present if input metaAnalysisWeights
are present.
p-values corresponding to Z.userWeights
q-values corresponding to p.userWeights
, only present if
getQvalues
is TRUE
.
The next set of columns is present only if input useRankPvalue
is TRUE
and contain the output
of the function rankPvalue
with the same column weights as the above meta-analysis. Depending
on the input options calculateQvalue
and pValueMethod
in rankPvalueOptions
, some
columns may be missing. The following columns are calculated using equal weights for each data set.
This is the minimum between pValueLowRank and pValueHighRank, i.e. min(pValueLow, pValueHigh)
Asymptotic p-value for observing a consistently low value across the columns of datS based on the rank method.
Asymptotic p-value for observing a consistently low value across the columns of datS based on the rank method.
This is the minimum between pValueLowScale and pValueHighScale, i.e. min(pValueLow, pValueHigh)
Asymptotic p-value for observing a consistently low value across the columns of datS based on the Scale method.
Asymptotic p-value for observing a consistently low value across the columns of datS based on the Scale method.
local false discovery rate (q-value) corresponding to the p-value pValueExtremeRank
local false discovery rate (q-value) corresponding to the p-value pValueLowRank
local false discovery rate (q-value) corresponding to the p-value pValueHighRank
local false discovery rate (q-value) corresponding to the p-value pValueExtremeScale
local false discovery rate (q-value) corresponding to the p-value pValueLowScale
local false discovery rate (q-value) corresponding to the p-value pValueHighScale
Analogous columns calculated by weighting each input set using the square root of the number of
samples, number of samples, and user weights (if given). The corresponding column names carry the suffixes
RootDofWeights
, DoFWeights
, userWeights
.
The following columns contain results returned by standardScreeningBinaryTrait
or
standardScreeningNumericTrait
(depending on whether the input trait is binary or continuous).
For binary traits, the following information is returned for each set:
Pearson correlation with a binary numeric version of the input variable. The numeric variable equals 1 for level 1 and 2 for level 2. The levels are given by levels(factor(y)).
Student t-test statistic
two-sided Student t-test p-value.
(if input qValues==TRUE
)
q-value (local false discovery rate) based on the Student T-test p-value (Storey et al 2004).
a (signed) ratio of mean values. If the mean in the first group (corresponding to level 1) is larger than that of the second group, it equals meanFirstGroup/meanSecondGroup. But if the mean of the second group is larger than that of the first group it equals -meanSecondGroup/meanFirstGroup (notice the minus sign).
means of columns in input datExpr
across
samples in the second group.
standard errors of columns in input datExpr
across samples in the
first group. Recall that SE(x)=sqrt(var(x)/n) where n is the number of non-missing values of x.
standard errors of columns in input datExpr
across samples in the second group.
the area under the ROC, also known as the concordance
index or C.index. This is a measure of discriminatory power. The measure lies between 0 and 1 where 0.5
indicates no discriminatory power. 0 indicates that the "opposite" predictor has perfect discriminatory
power. To compute it we use the function rcorr.cens with outx=TRUE
(from Frank Harrel's
package Hmisc).
number of samples with finite measurements for each gene.
If input kruskalTest
is TRUE
, the following columns further summarize results of
Kruskal-Wallis test:
Kruskal-Wallis test statistic.
(Warning: experimental) Kruskal-Wallis test statistic including a sign that indicates whether the average rank is higher in second group (positive) or first group (negative).
Kruskal-Wallis test p-value.
q-values corresponding to the Kruskal-Wallis test p-value (if
input qValues==TRUE
).
Z statistics obtained from pvalueStudent.Set1, pvalueStudent.Set2, ...
or from pvaluekruskal.Set1, pvaluekruskal.Set2, ...
, depending on input metaKruskal
.
For numeric traits, the following columns are returned:
correlations of all genes with the trait
Fisher Z statistics corresponding to the correlations
Student p-values of the correlations
(if input qValues==TRUE
) q-values of the
correlations calculated from the p-values
area under the ROC
number of samples present for the calculation of each association.
Expression data (or other data) in multi-set format (see checkSets
). A vector of lists; in
each list there must be a component named data
whose content
is a matrix or dataframe or array of dimension 2.
Trait or ourcome data in multi-set format. Only one trait is allowed; consequesntly, the data
component of each component list can be either a vector or a data frame (matrix, array of dimension 2).
Logical: is the trait binary (TRUE
) or continuous (FALSE
)? If not given, the decision will
be made based on the content of multiTrait
.
Optional specification of set weights for meta-analysis. If given, must be a vector of non-negative
weights, one entry for each set contained in multiExpr
.
Correlation function to be used for screening. Should be either the default cor
or its
robust alternative, bicor
.
A named list giving extra arguments to be passed to the correlation function.
Logical: should q-values (FDRs) be calculated?
Logical: should area under the ROC be calculated? Caution, enabling the calculation will slow the function down considerably for large data sets.
Logical: should the rankPvalue
function be used to obtain alternative
meta-analysis statistics?
Additional options for function rankPvalue
. These include
na.last
(default "keep"
), ties.method
(default "average"
),
calculateQvalue
(default copied from input getQvalues
),
and pValueMethod
(default "all"
).
See the help file for rankPvalue
for full details.
Optional specification of set names (labels). These are used to label the corresponding components of the
output. If not given, will be taken from the names
attribute of multiExpr
. If
names(multiExpr)
is NULL
, generic names of the form Set_1, Set2, ...
will be used.
Logical: should the Kruskal test be performed in addition to t-test? Only applies to binary traits.
Logical: should the t-test assume equal variance in both groups? If TRUE
, the function will warn
the user that the returned test statistics will be different from the results of the standard
t.test
function.
Logical: should the meta-analysis be based on the results of Kruskal test (TRUE
) or Student t-test
(FALSE
)?
Specification of what should happen to missing values in t.test
.
Peter Langfelder
The Stouffer method of combines Z statistics by simply taking a mean of input Z statistics and multiplying
it by sqrt(n)
, where n
is the number of input data sets. We refer to this method as
Stouffer.equalWeights
. In general, a better (i.e., more powerful) method of combining Z statistics is
to weigh them by the number of degrees of freedom (which approximately equals n
). We refer to this
method as weightedStouffer
. Finally, the user can also specify custom weights, for example if a data
set needs to be downweighted due to technical concerns; however, specifying own weights by hand should be
done carefully to avoid possible selection biases.
For Stouffer's method, see
Stouffer, S.A., Suchman, E.A., DeVinney, L.C., Star, S.A. & Williams, R.M. Jr. 1949. The American Soldier, Vol. 1: Adjustment during Army Life. Princeton University Press, Princeton.
A discussion of weighted Stouffer's method can be found in
Whitlock, M. C., Combining probability from independent tests: the weighted Z-method is superior to Fisher's approach, Journal of Evolutionary Biology 18:5 1368 (2005)
standardScreeningBinaryTrait
, standardScreeningNumericTrait
for screening
functions for individual data sets