Test whether missingness is contingent upon the observed variables, according to the methodology developed by Jamshidian and Jalal (2010) (see Details).
mcar(
x,
imputed = mice(x, method = "norm"),
min_n = 6,
method = "auto",
replications = 10000,
use_chisq = 30,
alpha = 0.05
)
An object of class mcar_object
.
An object for which a method exists; usually a data.frame
.
Either an object of class mids
, as returned by
mice()
, or a list of data.frame
s.
Atomic numeric, must be greater than 1. When there are missing
data patterns with fewer than min_n
cases, all cases with that pattern will
be removed from x
and imputed
.
Atomic character. If it is known (or assumed) that data are
either multivariate normally distributed or not, then use either
method = "hawkins"
or method = "nonparametric"
, respectively.
The default argument method = "auto"
follows the procedure outlined in the
Details section, and in Figure 7 of Jamshidian and Jalal (2010).
Number of replications used to simulate the Neyman
distribution when performing Hawkins' test. As this method is based on random
sampling, use a high number of replications
(and optionally,
set.seed()
) to minimize Monte Carlo error and ensure reproducibility.
Atomic integer, indicating the minimum number of cases within a group k that triggers the use of asymptotic Chi-square distribution instead of the emprical distribution in the Neyman uniformity test, which is performed as part of Hawkins' test.
Atomic numeric, indicating the significance level of tests.
Caspar J. Van Lissa
Three types of missingness have been distinguished in the literature (Rubin, 1976): Missing completely at random (MCAR), which means that missingness is random; missing at random (MAR), which means that missingness is contingent on the observed; and missing not at random (MNAR), which means that missingness is related to unobserved data.
Jamshidian and Jalal's non-parametric MCAR test assumes that the missing data are either MCAR or MAR, and tests whether the missingness is independent of the observed values. If so, the covariance matrices of the imputed data will be equal accross groups with different patterns of missingness. This test consists of the following procedure:
Data are imputed.
The imputed data are split into k groups according to the
k missing data patterns in the original data (see
md.pattern()
).
Perform Hawkins' test for equality of covariances across the k groups.
If the test is not significant, conclude that there is no evidence against multivariate normality of the data, nor against MCAR.
If the test is significant, and multivariate normality of the data can be assumed, then it can be concluded that missingness is MAR.
If multivariate normality cannot be assumed, then perform the Anderson-Darling non-parametric test for equality of covariances across the k groups.
If the Anderson-Darling test is not significant, this is evidence against multivariate normality - but no evidence against MCAR.
If the Anderson-Darling test is significant, this is evidence it can be concluded that missingness is MAR.
Note that, despite its name in common parlance, an MCAR test can only indicate whether missingness is MCAR or MAR. The procedure cannot distinguish MCAR from MNAR, so a non-significant result does not rule out MNAR.
This is a re-implementation of the function TestMCARNormality
, which was
originally published in the R-packgage MissMech
, which has been removed
from CRAN. This new implementation is faster, as its backend is written in
C++. It also enhances the functionality of the original:
Multiply imputed data can now be used; the median p-value and test statistic across replications is then reported, as suggested by Eekhout, Wiel, and Heymans (2017).
The printing method for an mcar_object
gives a warning when at
least one p-value of either test was significant. In this case, it is
recommended to inspect the range of p-values, and consider potential
violations of MCAR.
A plotting method for an mcar_object
is provided.
A plotting method for the $md.pattern
element of an mcar_object
is provided.
Rubin, D. B. (1976). Inference and Missing Data. Biometrika, Vol. 63, No. 3, pp. 581-592. tools:::Rd_expr_doi("10.2307/2335739")
Eekhout, I., M. A. Wiel, & M. W. Heymans (2017). Methods for Significance Testing of Categorical Covariates in Logistic Regression Models After Multiple Imputation: Power and Applicability Analysis. BMC Medical Research Methodology 17 (1): 129.
Jamshidian, M., & Jalal, S. (2010). Tests of homoscedasticity, normality, and missing completely at random for incomplete multivariate data. Psychometrika, 75(4), 649–674. tools:::Rd_expr_doi("10.1007/s11336-010-9175-3")
res <- mcar(nhanes)
# Examine test results
res
# Plot p-values across imputed data sets
plot(res)
# Plot md patterns used for the test
plot(res, type = "md.pattern")
# Note difference with the raw md.patterns:
md.pattern(nhanes)
Run the code above in your browser using DataLab