rma.mv: Meta-Analysis via Multivariate/Multilevel Linear (Mixed-Effects) Models

Description

Function to fit meta-analytic multivariate/multilevel fixed- and random/mixed-effects models with or without moderators via linear (mixed-effects) models. See below and the documentation of the metafor-package for more details on these models.

Usage

rma.mv(yi, V, mods, random, struct="CS",
       intercept=TRUE, data, slab, subset,
       method="REML", tdist=FALSE, level=95, digits=4, btt,
       R, sigma2, tau2, rho, verbose=FALSE, control)

Arguments

vector of length $k$ with the observed effect sizes or outcomes. See Details.

vector of length $k$ with the corresponding sampling variances or a $k \times k$ variance-covariance matrix of the sampling errors. See Details.

mods

optional argument to include one or more moderators in the model. A single moderator can be given as a vector of length $k$ specifying the values of the moderator. Multiple moderators are specified by giving a matrix with $k$ rows and $p'$ columns. Altern

random

either a single one-sided formula or list of one-sided formulas to specify the random-effects structure of the model. See Details.

struct

character string to specify the variance structure of an ~ inner | outer formula in the random argument. Either "CS" for compound symmetry, "HCS" for heteroscedastic compound symmetry, or "UN"

intercept

logical indicating whether an intercept term should be added to the model (default is TRUE).

data

optional data frame containing the data supplied to the function.

slab

optional vector with unique labels for the $k$ studies.

subset

optional vector indicating the subset of studies that should be used for the analysis. This can be a logical vector of length $k$ or a numeric vector indicating the indices of the observations to include.

method

character string specifying whether a fixed- or a random/mixed-effects model should be fitted. A fixed-effects model (with or without moderators) is fitted when using method="FE". Random/mixed-effects models are fitted by setting method

tdist

logical specifying whether test statistics and confidence intervals should be based on the normal (when FALSE, the default) or the t-distribution (when TRUE). See Details.

level

numerical value between 0 and 100 specifying the confidence interval level (default is 95).

digits

integer specifying the number of decimal places to which the printed results should be rounded (default is 4).

btt

optional vector of indices specifying which coefficients to include in the omnibus test of moderators. See Details.

an optional named list of known correlation matrices corresponding to (some of) the components specified via the random argument. See Details.

sigma2

optional numerical vector (of the same length as the number of random intercept components specified via the random argument) to fix the corresponding $\sigma^2$ value(s). A specific $\sigma^2$ value can be fixed by setting the corresponding

tau2

optional numerical value (for struct="CS") or vector (for struct="HCS" or struct="UN") to fix the amount of (residual) heterogeneity in the levels of the inner factor corresponding to an ~ inner | outer

rho

optional numerical value (for struct="CS" or struct="HCS") or vector (for struct="UN") to fix the correlation between levels of the inner factor corresponding to an ~ inner | outer formula specified in t

verbose

logical indicating whether output should be generated on the progress of the model fitting (default is FALSE). Can also be an integer. Values > 1 generate more verbose output. See Note.

control

optional list of control values for the estimation algorithms. If unspecified, default values are defined inside the function. See Note.

Value

An object of class c("rma.mv","rma"). The object is a list containing the following components:
bestimated coefficients of the model.
sestandard errors of the coefficients.
zvaltest statistics of the coefficients.
pvalp-values for the test statistics.
ci.lblower bound of the confidence intervals for the coefficients.
ci.ubupper bound of the confidence intervals for the coefficients.
vbvariance-covariance matrix of the estimated coefficients.
sigma2estimated $\sigma^2$ value(s).
tau2estimated $\tau^2$ value(s).
rhoestimated $\rho$ value(s).
knumber of studies included in the model.
pnumber of coefficients in the model (including the intercept).
mnumber of coefficients included in the omnibus test of coefficients.
QEtest statistic for the test of (residual) heterogeneity.
QEpp-value for the test of (residual) heterogeneity.
QMtest statistic for the omnibus test of coefficients.
QMpp-value for the omnibus test of coefficients.
int.onlylogical that indicates whether the model is an intercept-only model.
yi, V, Xthe vector of outcomes, the corresponding variance-covariance matrix of the sampling errors, and the model matrix of the model.
fit.statsa list with the log-likelihood, deviance, AIC, BIC, and AICc values.
...some additional elements/values.
The results of the fitted model are neatly formated and printed with the print.rma.mv function. If fit statistics should also be given, use summary.rma (or use the fitstats.rma function to extract them). For random/mixed-effects models, the profile.rma.mv function can be used to obtain a plot of the (restricted) log-likelihood as a function of a specific variance component or correlation parameter of the model.

Details

Specifying the Data The function can be used in conjunction with any of the usual effect size or outcome measures used in meta-analyses (e.g., log odds ratios, log relative risks, risk differences, mean differences, standardized mean differences, raw correlation coefficients, correlation coefficients transformed with Fisher's r-to-z transformation, and so on). Simply specify the observed outcomes via the yi argument and the corresponding sampling variances via the V argument. In case the sampling errors are correlated, then one can specify the entire variance-covariance matrix of the sampling errors via the V argument. The escalc function can be used to compute a wide variety of effect size and outcome measures (and the corresponding sampling variances) based on summary statistics. Equations for computing the covariance between sampling errors for a variety of different effect size or outcome measures can be found in Gleser and Olkin (2009). For raw and Fisher's r-to-z transformed correlations, one can find suitable equations, for example, in Steiger (1980). Specifying Fixed Effects When method="FE", a fixed-effects model is fitted to the data (note: arguments random, struct, sigma2, tau2, rho, and R are not relevant then and are ignored). The model is then simply given by $\mathbf{y} \sim N(\mathbf{1} \beta_0, \mathbf{V})$, where $\mathbf{y}$ is the (column) vector with the observed effect sizes or outcomes, $\mathbf{1}$ is a column vector of 1's, $\beta_0$ is the (average) true effect size or outcome, and $\mathbf{V}$ is the variance-covariance matrix of the sampling errors (if a vector of sampling variances is provided via the V argument, then $\mathbf{V}$ is assumed to be diagonal). One or more moderators can be included in the model via the mods argument. A single moderator can be given as a (row or column) vector of length $k$ specifying the values of the moderator. Multiple moderators are specified by giving an appropriate model matrix (i.e., $\mathbf{X}$) with $k$ rows and $p'$ columns (e.g., using mods = cbind(mod1, mod2, mod3), where mod1, mod2, and mod3 correspond to the names of the variables for the three moderator variables). The intercept is added to the model matrix by default unless intercept=FALSE. Alternatively, one can use the standard formula syntax to specify the model. In this case, the mods argument should be set equal to a one-sided formula of the form mods = ~ model (e.g., mods = ~ mod1 + mod2 + mod3). Interactions, polynomial terms, and factors can be easily added to the model in this manner. When specifying a model formula via the mods argument, the intercept argument is ignored. Instead, the inclusion/exclusion of the intercept term is controlled by the specified formula (e.g., mods = ~ mod1 + mod2 + mod3 - 1 would lead to the removal of the intercept term). With moderators included, the model is then given by $\mathbf{y} \sim N(\mathbf{X} \mathbf{\beta}, \mathbf{V})$, where $\mathbf{X}$ denotes the model matrix containing the moderator values (and possibly the intercept) and $\mathbf{\beta}$ is a column vector containing the corresponding model coefficients. Fixed-effects models with or without moderators are fitted via generalized/weighted least squares estimation. Specifying Random Effects When method="ML" or method="REML", one can fit random/mixed-effects models to the data by specifying the random effects structure via the random argument. The random argument is either a single one-sided formula or a list of one-sided formulas. A formula specified via this argument can be of the form ~ 1 | id. Such a formula adds random effects corresponding to the grouping variable/factor id to the model. Effects or outcomes with the same value/level of the id variable/factor receive the same random effect, while effects or outcomes with different values/levels of the id variable/factor are assumed to be independent. The variance component corresponding to such a formula is denoted by $\sigma^2$. An arbitrary number of such formulas can be specified as a list of formulas (with variance components $\sigma^2_1$, $\sigma^2_2$, and so on). Such random effects components are useful to model clustering (i.e., correlation) induced by a multilevel structure in the data (e.g., effects derived from the same paper, lab, research group, or species may be more similar to each other than effects derived from different papers, labs, research groups, or species). In addition or alternatively to specifying one or multiple ~ 1 | id terms, the random argument can also contain one (and only one!) formula of the form ~ inner | outer. Effects or outcomes with different values/levels of the outer grouping variable/factor are assumed to be independent, while effects or outcomes with the same value/level of the outer grouping variable/factor receive correlated random effects corresponding to the levels of the inner grouping variable/factor. The struct argument is used to specify the variance structure corresponding to the inner variable/factor. With struct="CS", a compound symmetric structure is assumed (i.e., a single variance component $\tau^2$ corresponding to all values/levels of the inner variable/factor and a single correlation coefficient $\rho$ for the correlation between different values/levels). With struct="HCS", a heteroscedastic compound symmetric structure is assumed (with variance components $\tau^2_1$, $\tau^2_2$, and so on, corresponding to the values/levels of the inner variable/factor and a single correlation coefficient $\rho$ for the correlation between different values/levels). Finally, with struct="UN", an unstructured variance-covariance matrix is assumed (with variance components $\tau^2_1$, $\tau^2_2$, and so on, corresponding to the values/levels of the inner variable/factor and correlation coefficients $\rho_{12}$, $\rho_{13}$, $\rho_{23}$, and so on, for the various combinations of the values/levels of the inner variable/factor). html,latex{For example, for an inner grouping variable/factor with three levels, the three structures correspond to:} html{

structs.png{options: width=700}} latex{

structs.png{options: width=5.4in}} With the outer factor corresponding to a study id variable and the inner factor corresponding to a variable indicating the treatment type or study arm, such a random effects component could be used to estimate how strongly different true effects or outcomes within the same study are correlated and/or whether the amount of heterogeneity differs across different treatment types/arms. The meta-analytic bivariate model (van Houwelingen, Arends, & Stijnen, 2002) can also be fitted in this manner (see the examples below). When the random argument contains a formula of the form ~ 1 | id, one can use the (optional) argument R to specify a corresponding known correlation matrix of the random effects (i.e., R = list(id = Cor), where Cor is the correlation matrix). In that case, effects or outcomes with the same value/level of the id variable/factor receive the same random effect, while effects or outcomes with different values/levels of the id variable/factor receive random effects that are correlated as specified in the corresponding correlation matrix given via the R argument. The column/row names of the correlation matrix given via the R argument must therefore contain all of the values/levels of the id variable/factor. When the random argument contains multiple formulas of the form ~ 1 | id, one can specify known correlation matrices for none, some, or all of those terms (e.g., random = list(~ 1 | id1, ~ 1 | id2), R = list(id1 = Cor1) or random = list(~ 1 | id1, ~ 1 | id2), R = list(id1 = Cor1, id2 = Cor2), where Cor1 and Cor2 are the correlation matrices corresponding to the grouping variables/factors id1 and id2, respectively). Random effects with a known (or at least approximately known) correlation structure are useful in a variety of contexts. For example, such components can be used to account for the correlations induced by a shared phylogenetic history among organisms (e.g., plants, fungi, animals). In that case, ~ 1 | id is used to specify the organisms and argument R is used to specify the phylogenetic correlation matrix of the organisms studied in the meta-analysis. The corresponding variance component then indicates how much variance/heterogeneity is attributable to the specified phylogeny. As another example, in a genetic meta-analysis studying disease association for several single nucleotide polymorphisms (SNPs), linkage disequilibrium (LD) among the SNPs can induce an approximately known degree of correlation among the effects. In that case, ~ 1 | id is used to specify the SNP and R the corresponding LD correlation map. Fixing Variance Components and/or Correlations Arguments sigma2, tau2, and rho can be used to fix particular variance components and/or correlations at a given value. This is useful for sensitivity analyses (e.g., for plotting the regular/restricted log-likelihood as a function of a particular variance component or correlation) or for imposing a desired variance-covariance structure on the data. For example, if random = list(~ 1 | id1, ~ 1 | id2), then sigma2 must be of length 2 (corresponding to $\sigma^2_1$ and $\sigma^2_2$) and a fixed value can be assigned to either or both variance components. Setting a particular component to NA means that the component will be estimated by the function. Argument tau2 is only relevant when the random argument contains an ~ inner | outer formula. In that case, if the tau2 argument is used, it must be either of length 1 (for struct="CS") or of the same length as the number of levels of the inner factor (for struct="HCS" or struct="UN"). A numeric value in the tau2 argument then fixes the corresponding variance component to that value, while NA means that the component will be estimated. Similarly, if argument rho is used, it must be either of length 1 (for struct="CS" or struct="HCS") or of length $l(l-1)/2$ (for struct="UN"), where $l$ denotes the number of levels of the inner factor. Again, a numeric value fixes the corresponding correlation, while NA means that the correlation will be estimated. For example, with struct="CS" and rho=0, the variance-covariance matrix of the inner factor will be diagonal with $\tau^2$ along the diagonal. Omnibus Test of Parameters For models including moderators, an omnibus test of all the model coefficients is conducted that excludes the intercept (the first coefficient) if it is included in the model. If no intercept is included in the model, then the omnibus test includes all of the coefficients in the model including the first. Alternatively, one can manually specify the indices of the coefficients to test via the btt argument. For example, use btt=c(3,4) to only include the third and fourth coefficient from the model in the test (if an intercept is included in the model, then it corresponds to the first coefficient in the model). Categorical Moderators Categorical moderator variables can be included in the model via the mods argument in the same way that appropriately (dummy) coded categorical independent variables can be included in linear models. One can either do the dummy coding manually or use a model formula together with the factor function to let Rhandle the coding automatically. Tests and Confidence Intervals By default, the test statistics of the individual coefficients in the model (and the corresponding confidence intervals) are based on the normal distribution, while the omnibus test is based on a chi-square distribution with $m$ degrees of freedom ($m$ being the number of coefficients tested). As an alternative, one can set tdist=TRUE, which slightly mimics the Knapp and Hartung (2003) method by using a t-distribution with $k-p$ degrees of freedom for tests of individual coefficients and confidence intervals and an F-distribution with $m$ and $k-p$ degrees of freedom ($p$ being the total number of model coefficients including the intercept if it is present) for the omnibus test statistic. Test for (Residual) Heterogeneity A test for (residual) heterogeneity is automatically carried out by the function. Without moderators in the model, this test is the generalized/weighted least squares extension of Cochran's $Q$-test, which tests whether the variability in the observed effect sizes or outcomes is larger than one would expect based on sampling variability (and the given covariances among the sampling errors) alone. A significant test suggests that the true effects or outcomes are heterogeneous. When moderators are included in the model, this is the $Q_E$-test for residual heterogeneity, which tests whether the variability in the observed effect sizes or outcomes that is not accounted for by the moderators included in the model is larger than one would expect based on sampling variability (and the given covariances among the sampling errors) alone.

References

Gleser, L. J., & Olkin, I. (2009). Stochastically dependent effect sizes. In H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), The handbook of research synthesis and meta-analysis (2nd ed., pp. 357--376). New York: Russell Sage Foundation. van Houwelingen, H. C., Arends, L. R., & Stijnen, T. (2002). Advanced methods in meta-analysis: Multivariate approach and meta-regression. Statistics in Medicine, 21, 589--624. Steiger, J. H. (1980). Tests for comparing elements of a correlation matrix. Psychological Bulletin, 87, 245--251. Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36(3), 1--48. http://www.jstatsoft.org/v36/i03/.

Examples

Run this code

### load BCG vaccine data
data(dat.bcg)

### calculate log odds ratios and corresponding sampling variances
dat <- escalc(measure="OR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)

### random-effects model using rma.uni()
rma(yi, vi, data=dat)

### change data into long format
dat.long <- to.long(measure="OR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)

### set levels of group variable ("exp" = experimental/vaccinated; "con" = control/non-vaccinated)
levels(dat.long$group) <- c("exp", "con")

### set "con" to reference level
dat.long$group <- relevel(dat.long$group, ref="con")

### calculate log odds and corresponding sampling variances
dat.long <- escalc(measure="PLO", xi=out1, mi=out2, data=dat.long)

### bivariate random-effects model using rma.mv()
res <- rma.mv(yi, vi, mods = ~ group, random = ~ group | study, struct="UN", data=dat.long)
res

### see help(dat.berkey1998) for another example of a multivariate model

Run the code above in your browser using DataLab