mdmr
(multivariate distance matrix regression) is used to regress a
distance matrix onto a set of predictors. It returns the test statistic,
pseudo R-square statistic, and analytic p-values for all predictors
jointly and for each predictor individually, conditioned on the rest.
mdmr(X, D = NULL, G = NULL, lambda = NULL, return.lambda = F,
start.acc = 1e-20, ncores = 1, perm.p = (nrow(as.matrix(X)) < 200),
nperm = 500, seed = NULL)
An object with six elements and a summary function. Calling
summary(mdmr.res)
produces a data frame comprised of:
Value of the corresponding MDMR test statistic
Numerator degrees of freedom for the corresponding effect
Size of the corresponding effect on the distance matrix
The p-value for each effect.
In addition to the information in the three columns comprising
summary(res)
, the res
object also contains:
A data.frame reporting the precision of each p-value. If
analytic p-values were computed, these are the maximum error bound of the
p-values reported by the davies
function in CompQuadForm
. If
permutation p-values were computed, it is the standard error of each
permutation p-value.
A vector of the eigenvalues of G
(if
return.lambda = T
).
Number of permutations used. Will read NA
if analytic
p-values were computed
Note that the printed output of summary(res)
will truncate p-values
to the smallest trustworthy values, but the object returned by
summary(res)
will contain the p-values as computed. The reason for
this truncation differs for analytic and permutation p-values. For an
analytic p-value, if the error bound of the Davies algorithm is larger than
the p-value, the only conclusion that can be drawn with certainty is that
the p-value is smaller than (or equal to) the error bound. For a permutation
test, the estimated p-value will be zero if no permuted test statistics are
greater than the observed statistic, but the zero p-value is only a product
of the finite number of permutations conduted. The only conclusion that can
be drawn is that the p-value is smaller than 1/nperm
.
A \(n x p\) matrix or data frame of predictors. Unordered factors
will be tested with contrast-codes by default, and ordered factors will be
tested with polynomial contrasts. For finer control of how categorical
predictors are handled, or if higher-order effects are desired, the output
from a call to model.matrix()
can be supplied to this argument as
well.
Distance matrix computed on the outcome data. Can be either a
matrix or an R dist
object. Either D
or G
must be passed to mdmr()
.
Gower's centered similarity matrix computed from D
.
Either D
or G
must be passed to mdmr
.
Optional argument: Eigenvalues of G
.
Eigendecomposition of large G
matrices can be somewhat time
consuming, and the theoretical p-values require the eigenvalues of
G
. If MDMR is to be conducted multiple times on one distance
matrix, it is advised to conduct the eigendecomposition once and pass the
eigenvalues to mdmr()
directly each time.
Logical; indicates whether or not the eigenvalues of
G
should be returned, if calculated. Default is FALSE
.
Starting accuracy of the Davies (1980) algorithm
implemented in the davies
function in the CompQuadForm
package (Duchesne & De Micheaux, 2010) that mdmr()
uses to compute
MDMR p-values.
Integer; if ncores
> 1, the parallel
package is used to speed computation. Note: Windows users must set
ncores = 1
because the parallel
pacakge relies on forking. See
mc.cores
in the mclapply
function in the
parallel
pacakge for more details.
Logical: should permutation-based p-values be computed instead
of analytic p-values? Default behavior is TRUE
if n < 200
and
FALSE
otherwise because the anlytic p-values depend on asymptotics.
for n > 200
and "permutation" otherwise.
Number of permutations to use if permutation-based p-values are to be computed.
Random seed to use to generate the permutation null distribution. Defaults to a random seed.
Daniel B. McArtor (dmcartor@gmail.com) [aut, cre]
This function is the fastest approach to conducting MDMR. It uses the fastest known computational strategy to compute the MDMR test statistic (see Appendix A of McArtor et al., 2017), and it uses fast, analytic p-values.
The slowest part of conducting MDMR is now the necessary eigendecomposition
of the G
matrix, whose computation time is a function of
\(n^3\). If MDMR is to be conducted multiple times on the same
distance matrix, it is recommended to compute eigenvalues of G
in
advance and pass them to the function rather than computing them every
time mdmr
is called, as is the case if the argument lambda
is left NULL
.
The distance matrix D
can be passed to mdmr
as either a
distance object or a symmetric matrix.
Davies, R. B. (1980). The Distribution of a Linear Combination of chi-square Random Variables. Journal of the Royal Statistical Society. Series C (Applied Statistics), 29(3), 323-333.
Duchesne, P., & De Micheaux, P. L. (2010). Computing the distribution of quadratic forms: Further comparisons between the Liu-Tang-Zhang approximation and exact methods. Computational Statistics and Data Analysis, 54(4), 858-862.
McArtor, D. B., Lubke, G. H., & Bergeman, C. S. (2017). Extending multivariate distance matrix regression with an effect size measure and the distribution of the test statistic. Psychometrika, 82, 1052-1077.
# --- The following two approaches yield equivalent results --- #
# Approach 1
data(mdmrdata)
D <- dist(Y.mdmr, method = "euclidean")
res1 <- mdmr(X = X.mdmr, D = D)
summary(res1)
# Approach 2
data(mdmrdata)
D <- dist(Y.mdmr, method = "euclidean")
G <- gower(D)
res2 <- mdmr(X = X.mdmr, G = G)
summary(res2)
Run the code above in your browser using DataLab