mcar: Jamshidian and Jalal's Non-Parametric MCAR Test

Description

Test whether missingness is contingent upon the observed variables, according to the methodology developed by Jamshidian and Jalal (2010) (see Details).

Usage

mcar(
  x,
  imputed = mice(x, method = "norm"),
  min_n = 6,
  method = "auto",
  replications = 10000,
  use_chisq = 30,
  alpha = 0.05
)

Value

An object of class mcar_object.

Arguments

x: An object for which a method exists; usually a data.frame.
imputed: Either an object of class mids, as returned by mice(), or a list of data.frames.
min_n: Atomic numeric, must be greater than 1. When there are missing data patterns with fewer than min_n cases, all cases with that pattern will be removed from x and imputed.
method: Atomic character. If it is known (or assumed) that data are either multivariate normally distributed or not, then use either method = "hawkins" or method = "nonparametric", respectively. The default argument method = "auto" follows the procedure outlined in the Details section, and in Figure 7 of Jamshidian and Jalal (2010).
replications: Number of replications used to simulate the Neyman distribution when performing Hawkins' test. As this method is based on random sampling, use a high number of replications (and optionally, set.seed()) to minimize Monte Carlo error and ensure reproducibility.
use_chisq: Atomic integer, indicating the minimum number of cases within a group k that triggers the use of asymptotic Chi-square distribution instead of the emprical distribution in the Neyman uniformity test, which is performed as part of Hawkins' test.
alpha: Atomic numeric, indicating the significance level of tests.

Author

Caspar J. Van Lissa

Details

Three types of missingness have been distinguished in the literature (Rubin, 1976): Missing completely at random (MCAR), which means that missingness is random; missing at random (MAR), which means that missingness is contingent on the observed; and missing not at random (MNAR), which means that missingness is related to unobserved data.

Jamshidian and Jalal's non-parametric MCAR test assumes that the missing data are either MCAR or MAR, and tests whether the missingness is independent of the observed values. If so, the covariance matrices of the imputed data will be equal accross groups with different patterns of missingness. This test consists of the following procedure:

Data are imputed.
The imputed data are split into k groups according to the k missing data patterns in the original data (see md.pattern()).
Perform Hawkins' test for equality of covariances across the k groups.
If the test is not significant, conclude that there is no evidence against multivariate normality of the data, nor against MCAR.
If the test is significant, and multivariate normality of the data can be assumed, then it can be concluded that missingness is MAR.
If multivariate normality cannot be assumed, then perform the Anderson-Darling non-parametric test for equality of covariances across the k groups.
If the Anderson-Darling test is not significant, this is evidence against multivariate normality - but no evidence against MCAR.
If the Anderson-Darling test is significant, this is evidence it can be concluded that missingness is MAR.

Note that, despite its name in common parlance, an MCAR test can only indicate whether missingness is MCAR or MAR. The procedure cannot distinguish MCAR from MNAR, so a non-significant result does not rule out MNAR.

This is a re-implementation of the function TestMCARNormality, which was originally published in the R-packgage MissMech, which has been removed from CRAN. This new implementation is faster, as its backend is written in C++. It also enhances the functionality of the original:

Multiply imputed data can now be used; the median p-value and test statistic across replications is then reported, as suggested by Eekhout, Wiel, and Heymans (2017).
The printing method for an mcar_object gives a warning when at least one p-value of either test was significant. In this case, it is recommended to inspect the range of p-values, and consider potential violations of MCAR.
A plotting method for an mcar_object is provided.
A plotting method for the $md.pattern element of an mcar_object is provided.

References

Rubin, D. B. (1976). Inference and Missing Data. Biometrika, Vol. 63, No. 3, pp. 581-592. tools:::Rd_expr_doi("10.2307/2335739")

Eekhout, I., M. A. Wiel, & M. W. Heymans (2017). Methods for Significance Testing of Categorical Covariates in Logistic Regression Models After Multiple Imputation: Power and Applicability Analysis. BMC Medical Research Methodology 17 (1): 129.

Jamshidian, M., & Jalal, S. (2010). Tests of homoscedasticity, normality, and missing completely at random for incomplete multivariate data. Psychometrika, 75(4), 649–674. tools:::Rd_expr_doi("10.1007/s11336-010-9175-3")

Examples

Run this code

res <- mcar(nhanes)
# Examine test results
res
# Plot p-values across imputed data sets
plot(res)
# Plot md patterns used for the test
plot(res, type = "md.pattern")
# Note difference with the raw md.patterns:
md.pattern(nhanes)

Run the code above in your browser using DataLab