Learn R Programming

mice

Multivariate Imputation by Chained Equations

The mice package implements a method to deal with missing data. The package creates multiple imputations (replacement values) for multivariate missing data. The method is based on Fully Conditional Specification, where each incomplete variable is imputed by a separate model. The MICE algorithm can impute mixes of continuous, binary, unordered categorical and ordered categorical data. In addition, MICE can impute continuous two-level data, and maintain consistency between imputations by means of passive imputation. Many diagnostic plots are implemented to inspect the quality of the imputations.

Installation

The mice package can be installed from CRAN as follows:

install.packages("mice")

The latest version can be installed from GitHub as follows:

install.packages("devtools")
devtools::install_github(repo = "amices/mice")

Minimal example

library(mice, warn.conflicts = FALSE)

# show the missing data pattern
md.pattern(nhanes)

#>    age hyp bmi chl   
#> 13   1   1   1   1  0
#> 3    1   1   1   0  1
#> 1    1   1   0   1  1
#> 1    1   0   0   1  2
#> 7    1   0   0   0  3
#>      0   8   9  10 27

The table and the graph summarize where the missing data occur in the nhanes dataset.

# multiple impute the missing values
imp <- mice(nhanes, maxit = 2, m = 2, seed = 1)
#> 
#>  iter imp variable
#>   1   1  bmi  hyp  chl
#>   1   2  bmi  hyp  chl
#>   2   1  bmi  hyp  chl
#>   2   2  bmi  hyp  chl

# inspect quality of imputations
stripplot(imp, chl, pch = 19, xlab = "Imputation number")

In general, we would like the imputations to be plausible, i.e., values that could have been observed if they had not been missing.

# fit complete-data model
fit <- with(imp, lm(chl ~ age + bmi))

# pool and summarize the results
summary(pool(fit))
#>          term estimate std.error statistic    df p.value
#> 1 (Intercept)     9.08     73.09     0.124  4.50  0.9065
#> 2         age    35.23     17.46     2.017  1.36  0.2377
#> 3         bmi     4.69      1.94     2.417 15.25  0.0286

The complete-data is fit to each imputed dataset, and the results are combined to arrive at estimates that properly account for the missing data.

mice 3.0

Version 3.0 represents a major update that implements the following features:

  1. blocks: The main algorithm iterates over blocks. A block is simply a collection of variables. In the common MICE algorithm each block was equivalent to one variable, which - of course - is the default; The blocks argument allows mixing univariate imputation method multivariate imputation methods. The blocks feature bridges two seemingly disparate approaches, joint modeling and fully conditional specification, into one framework;

  2. where: The where argument is a logical matrix of the same size of data that specifies which cells should be imputed. This opens up some new analytic possibilities;

  3. Multivariate tests: There are new functions D1(), D2(), D3() and anova() that perform multivariate parameter tests on the repeated analysis from on multiply-imputed data;

  4. formulas: The old form argument has been redesign and is now renamed to formulas. This provides an alternative way to specify imputation models that exploits the full power of R’s native formula’s.

  5. Better integration with the tidyverse framework, especially for packages dplyr, tibble and broom;

  6. Improved numerical algorithms for low-level imputation function. Better handling of duplicate variables.

  7. Last but not least: A brand new edition AND online version of Flexible Imputation of Missing Data. Second Edition.

See MICE: Multivariate Imputation by Chained Equations for more resources.

I’ll be happy to take feedback and discuss suggestions. Please submit these through Github’s issues facility.

Resources

Books

  1. Van Buuren, S. (2018). Flexible Imputation of Missing Data. Second Edition.. Chapman & Hall/CRC. Boca Raton, FL.

Course materials

  1. Handling Missing Data in R with mice
  2. Statistical Methods for combined data sets

Vignettes

  1. Ad hoc methods and the MICE algorithm
  2. Convergence and pooling
  3. Inspecting how the observed data and missingness are related
  4. Passive imputation and post-processing
  5. Imputing multilevel data
  6. Sensitivity analysis with mice
  7. Generate missing values with ampute
  8. futuremice: Wrapper for parallel MICE imputation through futures

Code from publications

  1. Flexible Imputation of Missing Data. Second edition.

Acknowledgement

The cute mice sticker was designed by Jaden M. Walters. Thanks Jaden!

Code of Conduct

Please note that the mice project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('mice')

Monthly Downloads

74,074

Version

3.16.0

License

GPL (>= 2)

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

June 5th, 2023

Functions in mice (3.16.0)

D1

Compare two nested models using D1-statistic
D3

Compare two nested models using D3-statistic
ampute.default.type

Default type in ampute()
ampute

Generate missing data for simulation purposes
ampute.continuous

Multivariate amputation based on continuous probability functions
D2

Compare two nested models using D2-statistic
ampute.default.odds

Default odds in ampute()
ampute.default.patterns

Default patterns in ampute
ampute.default.weights

Default weights in ampute
ampute.default.freq

Default freq in ampute
brandsma

Brandsma school data used Snijders and Bosker (2012)
anova.mira

Compare several nested models
appendbreak

Appends specified break to the data
ampute.discrete

Multivariate amputation based on discrete probability functions
ampute.mcar

Multivariate amputation under a MCAR mechanism
bwplot.mads

Box-and-whisker plot of amputed and non-amputed data
convergence

Computes convergence diagnostics for a mids object
densityplot.mids

Density plot of observed and imputed data
complete.mids

Extracts the completed data from a mids object
as.mira

Create a mira object from repeated analyses
construct.blocks

Construct blocks from formulas and predictorMatrix
as.mids

Converts an imputed dataset (long format) into a mids object
fdgs

Fifth Dutch growth study 2009
fico

Fraction of incomplete cases among cases with observed
flux

Influx and outflux of multivariate missing data patterns
estimice

Computes least squares parameters
employee

Employee selection data
fluxplot

Fluxplot of the missing data pattern
futuremice

Wrapper function that runs MICE in parallel
is.mira

Check for mira object
getfit

Extract list of fitted models
filter.mids

Subset rows of a mids object
fix.coef

Fix coefficients and update model
is.mitml.result

Check for mitml.result object
ifdo

Conditional imputation helper
is.mads

Check for mads object
make.blocks

Creates a blocks argument
mads-class

Multivariate amputed data set (mads)
md.pairs

Missing data pattern by variable pairs
extend.formulas

Extends formula's with predictor matrix settings
extend.formula

Extends a formula with predictors
make.where

Creates a where argument
mammalsleep

Mammal sleep data
ic

Select incomplete cases
as.mitml.result

Converts into a mitml.result object
mice.impute.2lonly.norm

Imputation at level 2 by Bayesian linear regression
mice.impute.2lonly.mean

Imputation of most likely value within the class
bwplot.mids

Box-and-whisker plot of observed and imputed data
getqbar

Extract estimate from mipo object
make.blots

Creates a blots argument
make.predictorMatrix

Creates a predictorMatrix argument
make.formulas

Creates a formulas argument
boys

Growth of Dutch boys
glance.mipo

Glance method to extract information from a `mipo` object
cbind

Combine R objects by rows and columns
md.pattern

Missing data pattern
ici

Incomplete case indicator
matchindex

Find index of matched donor units
lm.mids

Linear regression for mids object
mdc

Graphical parameter for missing data plots
mice.impute.pmm

Imputation by predictive mean matching
mice.impute.polr

Imputation of ordered data by polytomous regression
mice.impute.logreg

Imputation by logistic regression
mice.impute.logreg.boot

Imputation by logistic regression using the bootstrap
leiden85

Leiden 85+ study
mice

mice: Multivariate Imputation by Chained Equations
cc

Select complete cases
make.visitSequence

Creates a visitSequence argument
cci

Complete case indicator
extractBS

Extract broken stick estimates from a lmer object
mice.impute.jomoImpute

Multivariate multilevel imputation using jomo
mice.impute.2l.bin

Imputation by a two-level logistic model using glmer
mice.impute.2l.lmer

Imputation by a two-level normal model using lmer
mice.impute.lasso.logreg

Imputation by direct use of lasso logistic regression
mice.impute.ri

Imputation by the random indicator method for nonignorable data
mice.impute.rf

Imputation by random forests
nelsonaalen

Cumulative hazard rate or Nelson-Aalen estimator
nhanes

NHANES example - all variables numerical
.pmm.match

Finds an imputed value from matches in the predictive metric (deprecated)
plot.mids

Plot the trace lines of the MICE algorithm
mice.impute.polyreg

Imputation of unordered data by polytomous regression
mice.impute.2l.pan

Imputation by a two-level normal model using pan
mice.impute.quadratic

Imputation of quadratic terms
mice.impute.mean

Imputation by the mean
mice.impute.midastouch

Imputation by predictive mean matching with distance aided donor selection
mice.impute.2l.norm

Imputation by a two-level normal model
mnar_demo_data

MNAR demo data
mice.impute.lasso.norm

Imputation by direct use of lasso linear regression
mice.impute.panImpute

Impute multilevel missing data using pan
mice.impute.lasso.select.logreg

Imputation by indirect use of lasso logistic regression
mcar

Jamshidian and Jalal's Non-Parametric MCAR Test
name.blocks

Name imputation blocks
pops

Project on preterm and small for gestational age infants (POPS)
pool.scalar

Multiple imputation pooling: univariate version
pool.r.squared

Pools R^2 of m models fitted to multiply-imputed data
windspeed

Subset of Irish wind speed data
popmis

Hox pupil popularity data with missing popularity scores
summary.mira

Summary of a mira object
print.mids

Print a mids object
potthoffroy

Potthoff-Roy data
supports.transparent

Supports semi-transparent foreground colors?
mice.impute.norm.nob

Imputation by linear regression without parameter uncertainty
with.mids

Evaluate an expression in multiple imputed datasets
pool

Combine estimates by pooling rules
mice.impute.norm.predict

Imputation by linear regression through prediction
quickpred

Quick selection of predictors from the data
print.mads

Print a mads object
xyplot.mads

Scatterplot of amputed and non-amputed data against weighted sum scores
xyplot.mids

Scatterplot of observed and imputed data
pool.compare

Compare two nested models fitted to imputed data
mice.theme

Set the theme for the plotting Trellis functions
mice.impute.norm.boot

Imputation by linear regression, bootstrap method
mids2mplus

Export mids object to Mplus
mice.impute.norm

Imputation by Bayesian linear regression
mice.impute.passive

Passive imputation
name.formulas

Name formula list elements
ncc

Number of complete cases
mids-class

Multiply imputed data set (mids)
parlmice

Wrapper function that runs MICE in parallel
ibind

Enlarge number of imputations by combining mids objects
glm.mids

Generalized linear model for mids object
fdd

SE Fireworks disaster data
tidy.mipo

Tidy method to extract results from a `mipo` object
tbc

Terneuzen birth cohort
pattern

Datasets with various missing data patterns
is.mipo

Check for mipo object
make.post

Creates a post argument
make.method

Creates a method argument
is.mids

Check for mids object
mice.impute.2lonly.pmm

Imputation at level 2 by predictive mean matching
mids2spss

Export mids object to SPSS
mipo

mipo: Multiple imputation pooled object
mice.impute.lda

Imputation by linear discriminant analysis
mice.impute.cart

Imputation by classification and regression trees
mice.impute.mpmm

Imputation by multivariate predictive mean matching
mice.mids

Multivariate Imputation by Chained Equations (Iteration Step)
mice.impute.sample

Imputation by simple random sampling
nhanes2

NHANES example - mixed numerical and discrete variables
mice.impute.mnar.logreg

Imputation under MNAR mechanism by NARFCS
mice.impute.lasso.select.norm

Imputation by indirect use of lasso linear regression
mira-class

Multiply imputed repeated analyses (mira)
norm.draw

Draws values of beta and sigma by Bayesian linear regression
nimp

Number of imputations per block
nic

Number of incomplete cases
stripplot.mids

Stripplot of observed and imputed data
squeeze

Squeeze the imputed values to be within specified boundaries.
version

Echoes the package version number
walking

Walking disability data
toenail2

Toenail data
reexports

Objects exported from other packages
selfreport

Self-reported and measured BMI
toenail

Toenail data