mixed.mdmr: Fit Mixed-MDMR models

Description

mixed.mdmr allows users to conduct multivariate distance matrix regression (MDMR) in the context of a (hierarchically) clustered sample without inflating Type-I error rates as a result of the violation of the independence assumption. This is done by invoking a mixed-effects modeling framework, in which clustering/grouping variables are specified as random effects and the covariate effects of interest are fixed effects. The input to mixed.mdmr largely reflects the input of the lmer function from the package lme4 insofar as the specification of random and fixed effects are concerned (see Arguments for details). Note that this function simply controls for the random effects in order to test the fixed effects; it does not facilitate point estimation or inference on the random effects.

Usage

mixed.mdmr(fmla, data, D = NULL, G = NULL, use.ssd = 1,
  start.acc = 1e-20, ncores = 1)

Value

An object with six elements and a summary function. Calling summary(mixed.mdmr.res) produces a data frame comprised of:

Statistic: Value of the corresponding MDMR test statistic
Numer DF: Numerator degrees of freedom for the corresponding effect
p-value: The p-value for each effect.

In addition to the information in the three columns comprising summary(res), the res object also contains:

p.prec: A data.frame reporting the precision of each p-value. If analytic p-values were computed, these are the maximum error bound of the p-values reported by the davies function in CompQuadForm. If permutation p-values were computed, it is the standard error of each permutation p-value.

Note that the printed output of summary(res) will truncate p-values to the smallest trustworthy values, but the object returned by summary(res) will contain the p-values as computed. The reason for this truncation differs for analytic and permutation p-values. For an analytic p-value, if the error bound of the Davies algorithm is larger than the p-value, the only conclusion that can be drawn with certainty is that the p-value is smaller than (or equal to) the error bound.

Arguments

fmla: A one-sided linear formula object describing both the fixed-effects and random-effects part of the model, beginning with an ~ operator, which is followed by the terms to include in the model, separated by + operators. Random-effects terms are distinguished by vertical bars (|) separating expressions for design matrices from grouping factors. Two vertical bars (||) can be used to specify multiple uncorrelated random effects for the same grouping variable.
data: A mandatory data frame containing the variables named in formula.
D: Distance matrix computed on the outcome data. Can be either a matrix or an R dist object. Either D or G must be passed to mdmr().
G: Gower's centered similarity matrix computed from D. Either D or G must be passed to mdmr.
use.ssd: The proportion of the total sum of squared distances (SSD) that will be targeted in the modeling process. In the case of non-Euclidean distances, specifying use.ssd to be slightly smaller than 1.00 (e.g., 0.99) can substantially lower the computational burden of mixed.mdmr while maintaining well-controlled Type-I error rates and only sacrificing a trivial amount of power. In the case of Euclidean distances the computational burden of mixed.mdmr is small, so use.ssd should be set to 1.00.
start.acc: Starting accuracy of the Davies (1980) algorithm implemented in the davies function in the CompQuadForm package (Duchesne & De Micheaux, 2010) that mdmr() uses to compute MDMR p-values.
ncores: Integer; if ncores > 1, the parallel package is used to speed computation. Note: Windows users must set ncores = 1 because the parallel pacakge relies on forking. See mc.cores in the mclapply function in the parallel pacakge for more details.

Author

Daniel B. McArtor (dmcartor@gmail.com) [aut, cre]

References

Davies, R. B. (1980). The Distribution of a Linear Combination of chi-square Random Variables. Journal of the Royal Statistical Society. Series C (Applied Statistics), 29(3), 323-333.

Duchesne, P., & De Micheaux, P. L. (2010). Computing the distribution of quadratic forms: Further comparisons between the Liu-Tang-Zhang approximation and exact methods. Computational Statistics and Data Analysis, 54(4), 858-862.

McArtor, D. B. (2017). Extending a distance-based approach to multivariate multiple regression (Doctoral Dissertation).

Examples

Run this code

data("clustmdmrdata")

# Get distance matrix
D <- dist(Y.clust)

# Regular MDMR without the grouping variable
mdmr.res <- mdmr(X = X.clust[,1:2], D = D, perm.p = FALSE)

# Results look significant
summary(mdmr.res)

# Account for grouping variable
mixed.res <- mixed.mdmr(~ x1 + x2 + (x1 + x2 | grp),
                        data = X.clust, D = D)

# Signifance was due to the grouping variable
summary(mixed.res)

Run the code above in your browser using DataLab