Learn R Programming

mice

Multivariate Imputation by Chained Equations

The mice package implements a method to deal with missing data. The package creates multiple imputations (replacement values) for multivariate missing data. The method is based on Fully Conditional Specification, where each incomplete variable is imputed by a separate model. The MICE algorithm can impute mixes of continuous, binary, unordered categorical and ordered categorical data. In addition, MICE can impute continuous two-level data, and maintain consistency between imputations by means of passive imputation. Many diagnostic plots are implemented to inspect the quality of the imputations.

Installation

The mice package can be installed from CRAN as follows:

install.packages("mice")

The latest version can be installed from GitHub as follows:

install.packages("devtools")
devtools::install_github(repo = "amices/mice")

Minimal example

library(mice, warn.conflicts = FALSE)

# show the missing data pattern
md.pattern(nhanes)
#>    age hyp bmi chl   
#> 13   1   1   1   1  0
#> 3    1   1   1   0  1
#> 1    1   1   0   1  1
#> 1    1   0   0   1  2
#> 7    1   0   0   0  3
#>      0   8   9  10 27

The table and the graph summarize where the missing data occur in the nhanes dataset.

# multiple impute the missing values
imp <- mice(nhanes, maxit = 2, m = 2, seed = 1)
#> 
#>  iter imp variable
#>   1   1  bmi  hyp  chl
#>   1   2  bmi  hyp  chl
#>   2   1  bmi  hyp  chl
#>   2   2  bmi  hyp  chl

# inspect quality of imputations
stripplot(imp, chl, pch = 19, xlab = "Imputation number")

In general, we would like the imputations to be plausible, i.e., values that could have been observed if they had not been missing.

# fit complete-data model
fit <- with(imp, lm(chl ~ age + bmi))

# pool and summarize the results
summary(pool(fit))
#>          term estimate std.error statistic    df p.value
#> 1 (Intercept)     9.08     73.09     0.124  4.50  0.9065
#> 2         age    35.23     17.46     2.017  1.36  0.2377
#> 3         bmi     4.69      1.94     2.417 15.25  0.0286

The complete-data is fit to each imputed dataset, and the results are combined to arrive at estimates that properly account for the missing data.

mice 3.0

Version 3.0 represents a major update that implements the following features:

  1. blocks: The main algorithm iterates over blocks. A block is simply a collection of variables. In the common MICE algorithm each block was equivalent to one variable, which - of course - is the default; The blocks argument allows mixing univariate imputation method multivariate imputation methods. The blocks feature bridges two seemingly disparate approaches, joint modeling and fully conditional specification, into one framework;

  2. where: The where argument is a logical matrix of the same size of data that specifies which cells should be imputed. This opens up some new analytic possibilities;

  3. Multivariate tests: There are new functions D1(), D2(), D3() and anova() that perform multivariate parameter tests on the repeated analysis from on multiply-imputed data;

  4. formulas: The old form argument has been redesign and is now renamed to formulas. This provides an alternative way to specify imputation models that exploits the full power of R’s native formula’s.

  5. Better integration with the tidyverse framework, especially for packages dplyr, tibble and broom;

  6. Improved numerical algorithms for low-level imputation function. Better handling of duplicate variables.

  7. Last but not least: A brand new edition AND online version of Flexible Imputation of Missing Data. Second Edition.

See MICE: Multivariate Imputation by Chained Equations for more resources.

I’ll be happy to take feedback and discuss suggestions. Please submit these through Github’s issues facility.

Resources

Books

  1. Van Buuren, S. (2018). Flexible Imputation of Missing Data. Second Edition.. Chapman & Hall/CRC. Boca Raton, FL.

Course materials

  1. Handling Missing Data in R with mice
  2. Statistical Methods for combined data sets

Vignettes

  1. Ad hoc methods and the MICE algorithm
  2. Convergence and pooling
  3. Inspecting how the observed data and missingness are related
  4. Passive imputation and post-processing
  5. Imputing multilevel data
  6. Sensitivity analysis with mice
  7. Generate missing values with ampute
  8. futuremice: Wrapper for parallel MICE imputation through futures

Code from publications

  1. Flexible Imputation of Missing Data. Second edition.

Acknowledgement

The cute mice sticker was designed by Jaden M. Walters. Thanks Jaden!

Code of Conduct

Please note that the mice project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('mice')

Monthly Downloads

58,762

Version

3.17.0

License

GPL (>= 2)

Issues

Pull Requests

Stars

Forks

Maintainer

Stef van Buuren

Last Published

November 27th, 2024

Functions in mice (3.17.0)

convergence

Computes convergence diagnostics for a mids object
bwplot.mids

Box-and-whisker plot of observed and imputed data
cbind

Combine R objects by rows and columns
cci

Complete case indicator
cc

Select complete cases
fico

Fraction of incomplete cases among cases with observed
boys

Growth of Dutch boys
complete.mids

Extracts the completed data from a mids object
construct.blocks

Construct blocks from formulas and predictorMatrix
extractBS

Extract broken stick estimates from a lmer object
fix.coef

Fix coefficients and update model
fdd

SE Fireworks disaster data
filter.mids

Subset rows of a mids object
densityplot.mids

Density plot of observed and imputed data
flux

Influx and outflux of multivariate missing data patterns
fluxplot

Fluxplot of the missing data pattern
fdgs

Fifth Dutch growth study 2009
extend.formulas

Extends formula's with predictor matrix settings
extend.formula

Extends a formula with predictors
getqbar

Extract estimate from mipo object
ifdo

Conditional imputation helper
glm.mids

Generalized linear model for mids object
ibind

Enlarge number of imputations by combining mids objects
futuremice

Wrapper function that runs MICE in parallel
mads

Multivariate amputed data set (mads)
getfit

Extract list of fitted models
make.formulas

Creates a formulas argument
make.blots

Creates a blots argument
is.mids

Check for mids object
is.mads

Check for mads object
is.mipo

Check for mipo object
make.blocks

Creates a blocks argument
is.mira

Check for mira object
is.mitml.result

Check for mitml.result object
mammalsleep

Mammal sleep data
make.predictorMatrix

Creates a predictorMatrix argument
make.visitSequence

Creates a visitSequence argument
make.where

Creates a where argument
leiden85

Leiden 85+ study
glance.mipo

Glance method to extract information from a `mipo` object
lm.mids

Linear regression for mids object
make.method

Creates a method argument
estimice

Computes least squares parameters
ici

Incomplete case indicator
ic

Select incomplete cases
employee

Employee selection data
make.post

Creates a post argument
md.pattern

Missing data pattern
matchindex

Find index of matched donor units
mcar

Jamshidian and Jalal's Non-Parametric MCAR Test
md.pairs

Missing data pattern by variable pairs
mice.impute.2l.pan

Imputation by a two-level normal model using pan
mice.impute.2lonly.norm

Imputation at level 2 by Bayesian linear regression
mice.impute.2lonly.mean

Imputation of most likely value within the class
mice.impute.2l.norm

Imputation by a two-level normal model
mice.impute.2lonly.pmm

Imputation at level 2 by predictive mean matching
mice.impute.cart

Imputation by classification and regression trees
mice.impute.2l.bin

Imputation by a two-level logistic model using glmer
mice.impute.2l.lmer

Imputation by a two-level normal model using lmer
mdc

Graphical parameter for missing data plots
mice

mice: Multivariate Imputation by Chained Equations
mice.impute.mean

Imputation by the mean
mice.impute.jomoImpute

Multivariate multilevel imputation using jomo
mice.impute.lasso.logreg

Imputation by direct use of lasso logistic regression
mice.impute.logreg.boot

Imputation by logistic regression using the bootstrap
mice.impute.logreg

Imputation by logistic regression
mice.impute.lasso.norm

Imputation by direct use of lasso linear regression
mice.impute.lasso.select.logreg

Imputation by indirect use of lasso logistic regression
mice.impute.lasso.select.norm

Imputation by indirect use of lasso linear regression
mice.impute.midastouch

Imputation by predictive mean matching with distance aided donor selection
mice.impute.lda

Imputation by linear discriminant analysis
mice.impute.norm

Imputation by Bayesian linear regression
mice.impute.passive

Passive imputation
mice.impute.mpmm

Imputation by multivariate predictive mean matching
mice.impute.pmm

Imputation by predictive mean matching
mice.impute.norm.predict

Imputation by linear regression through prediction
mice.impute.panImpute

Impute multilevel missing data using pan
mice.impute.norm.nob

Imputation by linear regression without parameter uncertainty
mice.impute.norm.boot

Imputation by linear regression, bootstrap method
mice.impute.mnar.logreg

Imputation under MNAR mechanism by NARFCS
mice.impute.polr

Imputation of ordered data by polytomous regression
mice.theme

Set the theme for the plotting Trellis functions
mice.impute.polyreg

Imputation of unordered data by polytomous regression
mids2mplus

Export mids object to Mplus
mids

Multiply imputed data set (mids)
mids2spss

Export mids object to SPSS
mice.impute.ri

Imputation by the random indicator method for nonignorable data
mice.impute.rf

Imputation by random forests
name.formulas

Name formula list elements
ncc

Number of complete cases
mice.impute.sample

Imputation by simple random sampling
mice.mids

Multivariate Imputation by Chained Equations (Iteration Step)
nelsonaalen

Cumulative hazard rate or Nelson-Aalen estimator
name.blocks

Name imputation blocks
mice.impute.quadratic

Imputation of quadratic terms
nhanes

NHANES example - all variables numerical
mnar_demo_data

MNAR demo data
norm.draw

Draws values of beta and sigma by Bayesian linear regression
nimp

Number of imputations per block
.pmm.match

Finds an imputed value from matches in the predictive metric (deprecated)
pool

Combine estimates by pooling rules
pool.scalar

Multiple imputation pooling: univariate version
pool.table

Combines estimates from a tidy table
mira

Create an object of class "mira"
nic

Number of incomplete cases
mipo

mipo: Multiple imputation pooled object
nhanes2

NHANES example - mixed numerical and discrete variables
popmis

Hox pupil popularity data with missing popularity scores
reexports

Objects exported from other packages
pops

Project on preterm and small for gestational age infants (POPS)
pool.r.squared

Pools R^2 of m models fitted to multiply-imputed data
quickpred

Quick selection of predictors from the data
selfreport

Self-reported and measured BMI
pool.compare

Compare two nested models fitted to imputed data
squeeze

Squeeze the imputed values to be within specified boundaries.
stripplot.mids

Stripplot of observed and imputed data
summary.mira

Summary of a mira object
print.mira

Print a mira object
tidy.mipo

Tidy method to extract results from a `mipo` object
tbc

Terneuzen birth cohort
supports.transparent

Supports semi-transparent foreground colors?
toenail2

Toenail data
potthoffroy

Potthoff-Roy data
with.mids

Evaluate an expression in multiple imputed datasets
toenail

Toenail data
version

Echoes the package version number
xyplot.mads

Scatterplot of amputed and non-amputed data against weighted sum scores
xyplot.mids

Scatterplot of observed and imputed data
pattern

Datasets with various missing data patterns
walking

Walking disability data
parlmice

Wrapper function that runs MICE in parallel
windspeed

Subset of Irish wind speed data
ampute.default.type

Default type in ampute()
ampute

Generate missing data for simulation purposes
D2

Compare two nested models using D2-statistic
D3

Compare two nested models using D3-statistic
D1

Compare two nested models using D1-statistic
anova.mira

Compare several nested models
as.mira

Create a mira object from repeated analyses
appendbreak

Appends specified break to the data
as.mitml.result

Converts into a mitml.result object
bwplot.mads

Box-and-whisker plot of amputed and non-amputed data
ampute.continuous

Multivariate amputation based on continuous probability functions
ampute.mcar

Multivariate amputation under a MCAR mechanism
ampute.discrete

Multivariate amputation based on discrete probability functions
ampute.default.freq

Default freq in ampute
ampute.default.weights

Default weights in ampute
as.mids

Converts an imputed dataset (long format) into a mids object
ampute.default.odds

Default odds in ampute()
brandsma

Brandsma school data used Snijders and Bosker (2012)
ampute.default.patterns

Default patterns in ampute