Learn R Programming

WGCNA (version 1.25-1)

metaAnalysis: Meta-analysis of binary and continuous variables

Description

This is a meta-analysis complement to functions standardScreeningBinaryTrait and standardScreeningNumericTrait. Given expression (or other) data from multiple independent data sets, and the corresponding clinical traits or outcomes, the function calculates multiple screening statistics in each data set, then calculates meta-analysis Z scores, p-values, and optionally q-values (False Discovery Rates). Three different ways of calculating the meta-analysis Z scores are provided: the Stouffer method, weighted Stouffer method, and using user-specified weights.

Usage

metaAnalysis(multiExpr, multiTrait, 
             binary = NULL, 
             metaAnalysisWeights = NULL, 
             corFnc = cor, corOptions = list(use = "p"), 
             getQvalues = FALSE, 
             getAreaUnderROC = FALSE,
             useRankPvalue = TRUE,
             rankPvalueOptions = list(),
             setNames = NULL, 
             kruskalTest = FALSE, var.equal = FALSE, 
             metaKruskal = kruskalTest, na.action = "na.exclude")

Arguments

multiExpr
Expression data (or other data) in multi-set format (see checkSets). A vector of lists; in each list there must be a component named data whose content is a matrix or dataframe or array of di
multiTrait
Trait or ourcome data in multi-set format. Only one trait is allowed; consequesntly, the data component of each component list can be either a vector or a data frame (matrix, array of dimension 2).
binary
Logical: is the trait binary (TRUE) or continuous (FALSE)? If not given, the decision will be made based on the content of multiTrait.
metaAnalysisWeights
Optional specification of set weights for meta-analysis. If given, must be a vector of non-negative weights, one entry for each set contained in multiExpr.
corFnc
Correlation function to be used for screening. Should be either the default cor or its robust alternative, bicor.
corOptions
A named list giving extra arguments to be passed to the correlation function.
getQvalues
Logical: should q-values (FDRs) be calculated?
getAreaUnderROC
Logical: should area under the ROC be calculated? Caution, enabling the calculation will slow the function down considerably for large data sets.
useRankPvalue
Logical: should the rankPvalue function be used to obtain alternative meta-analysis statistics?
rankPvalueOptions
Additional options for function rankPvalue. These include na.last (default "keep"), ties.method (default "average"), calculateQvalue (def
setNames
Optional specification of set names (labels). These are used to label the corresponding components of the output. If not given, will be taken from the names attribute of multiExpr. If names(multiExpr) is NULL
kruskalTest
Logical: should the Kruskal test be performed in addition to t-test? Only applies to binary traits.
var.equal
Logical: should the t-test assume equal variance in both groups? If TRUE, the function will warn the user that the returned test statistics will be different from the results of the standard t.test
metaKruskal
Logical: should the meta-analysis be based on the results of Kruskal test (TRUE) or Student t-test (FALSE)?
na.action
Specification of what should happen to missing values in t.test.

Value

  • Data frame with the following components:
  • IDIdentifier of the input genes (or other variables)
  • Z.equalWeightsMeta-analysis Z statistics obtained using Stouffer's method with equal weights
  • p.equalWeightsp-values corresponding to Z.Stouffer.equalWeights
  • q.equalWeightsq-values corresponding to p.Stouffer.equalWeights, only present if getQvalues is TRUE.
  • Z.RootDoFWeightsMeta-analysis Z statistics obtained using Stouffer's method with weights given by the square root of the number of (non-missing) samples in each data set
  • p.RootDoFWeightsp-values corresponding to Z.DoFWeights
  • q.RootDoFWeightsq-values corresponding to p.DoFWeights, only present if getQvalues is TRUE.
  • Z.DoFWeightsMeta-analysis Z statistics obtained using Stouffer's method with weights given by the number of (non-missing) samples in each data set
  • p.DoFWeightsp-values corresponding to Z.DoFWeights
  • q.DoFWeightsq-values corresponding to p.DoFWeights, only present if getQvalues is TRUE.
  • Z.userWeightsMeta-analysis Z statistics obtained using Stouffer's method with user-defined weights. Only present if input metaAnalysisWeights are present.
  • p.userWeightsp-values corresponding to Z.userWeights
  • q.userWeightsq-values corresponding to p.userWeights, only present if getQvalues is TRUE.
  • The next set of columns is present only if input useRankPvalue is TRUE and contain the output of the function rankPvalue with the same column weights as the above meta-analysis. Depending on the input options calculateQvalue and pValueMethod in rankPvalueOptions, some columns may be missing. The following columns are calculated using equal weights for each data set.
  • pValueExtremeRank.equalWeightsThis is the minimum between pValueLowRank and pValueHighRank, i.e. min(pValueLow, pValueHigh)
  • pValueLowRank.equalWeightsAsymptotic p-value for observing a consistently low value across the columns of datS based on the rank method.
  • pValueHighRank.equalWeightsAsymptotic p-value for observing a consistently low value across the columns of datS based on the rank method.
  • pValueExtremeScale.equalWeightsThis is the minimum between pValueLowScale and pValueHighScale, i.e. min(pValueLow, pValueHigh)
  • pValueLowScale.equalWeightsAsymptotic p-value for observing a consistently low value across the columns of datS based on the Scale method.
  • pValueHighScale.equalWeightsAsymptotic p-value for observing a consistently low value across the columns of datS based on the Scale method.
  • qValueExtremeRank.equalWeightslocal false discovery rate (q-value) corresponding to the p-value pValueExtremeRank
  • qValueLowRank.equalWeightslocal false discovery rate (q-value) corresponding to the p-value pValueLowRank
  • qValueHighRank.equalWeightslocal false discovery rate (q-value) corresponding to the p-value pValueHighRank
  • qValueExtremeScale.equalWeightslocal false discovery rate (q-value) corresponding to the p-value pValueExtremeScale
  • qValueLowScale.equalWeightslocal false discovery rate (q-value) corresponding to the p-value pValueLowScale
  • qValueHighScale.equalWeightslocal false discovery rate (q-value) corresponding to the p-value pValueHighScale
  • ...Analogous columns calculated by weighting each input set using the square root of the number of samples, number of samples, and user weights (if given). The corresponding column names carry the suffixes RootDofWeights, DoFWeights, userWeights.
  • The following columns contain results returned by standardScreeningBinaryTrait or standardScreeningNumericTrait (depending on whether the input trait is binary or continuous).

    For binary traits, the following information is returned for each set:

  • corPearson.Set_1, corPearson.Set_2,...Pearson correlation with a binary numeric version of the input variable. The numeric variable equals 1 for level 1 and 2 for level 2. The levels are given by levels(factor(y)).
  • t.Student.Set_1, t.Student.Set_2, ...Student t-test statistic
  • pvalueStudent.Set_1, pvalueStudent.Set_2, ...two-sided Student t-test p-value.
  • qvalueStudent.Set_1, qvalueStudent.Set_2, ...(if input qValues==TRUE) q-value (local false discovery rate) based on the Student T-test p-value (Storey et al 2004).
  • foldChange.Set_1, foldChange.Set_2, ...a (signed) ratio of mean values. If the mean in the first group (corresponding to level 1) is larger than that of the second group, it equals meanFirstGroup/meanSecondGroup. But if the mean of the second group is larger than that of the first group it equals -meanSecondGroup/meanFirstGroup (notice the minus sign).
  • meanFirstGroup.Set_1, meanSecondGroup.Set_2, ...means of columns in input datExpr across samples in the second group.
  • SE.FirstGroup.Set_1, SE.FirstGroup.Set_2, ...standard errors of columns in input datExpr across samples in the first group. Recall that SE(x)=sqrt(var(x)/n) where n is the number of non-missing values of x.
  • SE.SecondGroup.Set_1, SE.SecondGroup.Set_2, ...standard errors of columns in input datExpr across samples in the second group.
  • areaUnderROC.Set_1, areaUnderROC.Set_2, ...the area under the ROC, also known as the concordance index or C.index. This is a measure of discriminatory power. The measure lies between 0 and 1 where 0.5 indicates no discriminatory power. 0 indicates that the "opposite" predictor has perfect discriminatory power. To compute it we use the function rcorr.cens with outx=TRUE (from Frank Harrel's package Hmisc).
  • nPresentSamples.Set_1, nPresentSamples.Set_2, ...number of samples with finite measurements for each gene.
  • If input kruskalTest is TRUE, the following columns further summarize results of Kruskal-Wallis test:
  • stat.Kruskal.Set_1, stat.Kruskal.Set_2, ...Kruskal-Wallis test statistic.
  • stat.Kruskal.signed.Set_1, stat.Kruskal.signed.Set_2,...(Warning: experimental) Kruskal-Wallis test statistic including a sign that indicates whether the average rank is higher in second group (positive) or first group (negative).
  • pvaluekruskal.Set_1, pvaluekruskal.Set_2, ...Kruskal-Wallis test p-value.
  • qkruskal.Set_1, qkruskal.Set_2, ...q-values corresponding to the Kruskal-Wallis test p-value (if input qValues==TRUE).
  • Z.Set1, Z.Set2, ...Z statistics obtained from pvalueStudent.Set1, pvalueStudent.Set2, ... or from pvaluekruskal.Set1, pvaluekruskal.Set2, ..., depending on input metaKruskal.
  • For numeric traits, the following columns are returned:
  • cor.Set_1, cor.Set_2, ...correlations of all genes with the trait
  • Z.Set1, Z.Set2, ...Fisher Z statistics corresponding to the correlations
  • pvalueStudent.Set_1, pvalueStudent.Set_2, ...Student p-values of the correlations
  • qvalueStudent.Set_1, qvalueStudent.Set_1, ...(if input qValues==TRUE) q-values of the correlations calculated from the p-values
  • AreaUnderROC.Set_1, AreaUnderROC.Set_2, ...area under the ROC
  • nPresentSamples.Set_1, nPresentSamples.Set_2, ...number of samples present for the calculation of each association.

Details

The Stouffer method of combines Z statistics by simply taking a mean of input Z statistics and multiplying it by sqrt(n), where n is the number of input data sets. We refer to this method as Stouffer.equalWeights. In general, a better (i.e., more powerful) method of combining Z statistics is to weigh them by the number of degrees of freedom (which approximately equals n). We refer to this method as weightedStouffer. Finally, the user can also specify custom weights, for example if a data set needs to be downweighted due to technical concerns; however, specifying own weights by hand should be done carefully to avoid possible selection biases.

References

For Stouffer's method, see

Stouffer, S.A., Suchman, E.A., DeVinney, L.C., Star, S.A. & Williams, R.M. Jr. 1949. The American Soldier, Vol. 1: Adjustment during Army Life. Princeton University Press, Princeton.

A discussion of weighted Stouffer's method can be found in

Whitlock, M. C., Combining probability from independent tests: the weighted Z-method is superior to Fisher's approach, Journal of Evolutionary Biology 18:5 1368 (2005)

See Also

standardScreeningBinaryTrait, standardScreeningNumericTrait for screening functions for individual data sets