⚠️There's a newer version (3.16.0) of this package.Take me there.

mice

Multivariate Imputation by Chained Equations

The mice package implements a method to deal with missing data. The package creates multiple imputations (replacement values) for multivariate missing data. The method is based on Fully Conditional Specification, where each incomplete variable is imputed by a separate model. The MICE algorithm can impute mixes of continuous, binary, unordered categorical and ordered categorical data. In addition, MICE can impute continuous two-level data, and maintain consistency between imputations by means of passive imputation. Many diagnostic plots are implemented to inspect the quality of the imputations.

Installation

The mice package can be installed from CRAN as follows:

install.packages("mice")

The latest version can be installed from GitHub as follows:

install.packages("devtools")
devtools::install_github(repo = "amices/mice")

Minimal example

library(mice, warn.conflicts = FALSE)

# show the missing data pattern
md.pattern(nhanes)

#>    age hyp bmi chl   
#> 13   1   1   1   1  0
#> 3    1   1   1   0  1
#> 1    1   1   0   1  1
#> 1    1   0   0   1  2
#> 7    1   0   0   0  3
#>      0   8   9  10 27

The table and the graph summarize where the missing data occur in the nhanes dataset.

# multiple impute the missing values
imp <- mice(nhanes, maxit = 2, m = 2, seed = 1)
#> 
#>  iter imp variable
#>   1   1  bmi  hyp  chl
#>   1   2  bmi  hyp  chl
#>   2   1  bmi  hyp  chl
#>   2   2  bmi  hyp  chl

# inspect quality of imputations
stripplot(imp, chl, pch = 19, xlab = "Imputation number")

In general, we would like the imputations to be plausible, i.e., values that could have been observed if they had not been missing.

# fit complete-data model
fit <- with(imp, lm(chl ~ age + bmi))

# pool and summarize the results
summary(pool(fit))
#>          term estimate std.error statistic    df p.value
#> 1 (Intercept)     9.08     73.09     0.124  4.50  0.9065
#> 2         age    35.23     17.46     2.017  1.36  0.2377
#> 3         bmi     4.69      1.94     2.417 15.25  0.0286

The complete-data is fit to each imputed dataset, and the results are combined to arrive at estimates that properly account for the missing data.

mice 3.0

Version 3.0 represents a major update that implements the following features:

  1. blocks: The main algorithm iterates over blocks. A block is simply a collection of variables. In the common MICE algorithm each block was equivalent to one variable, which - of course - is the default; The blocks argument allows mixing univariate imputation method multivariate imputation methods. The blocks feature bridges two seemingly disparate approaches, joint modeling and fully conditional specification, into one framework;

  2. where: The where argument is a logical matrix of the same size of data that specifies which cells should be imputed. This opens up some new analytic possibilities;

  3. Multivariate tests: There are new functions D1(), D2(), D3() and anova() that perform multivariate parameter tests on the repeated analysis from on multiply-imputed data;

  4. formulas: The old form argument has been redesign and is now renamed to formulas. This provides an alternative way to specify imputation models that exploits the full power of R’s native formula’s.

  5. Better integration with the tidyverse framework, especially for packages dplyr, tibble and broom;

  6. Improved numerical algorithms for low-level imputation function. Better handling of duplicate variables.

  7. Last but not least: A brand new edition AND online version of Flexible Imputation of Missing Data. Second Edition.

See MICE: Multivariate Imputation by Chained Equations for more resources.

I’ll be happy to take feedback and discuss suggestions. Please submit these through Github’s issues facility.

Resources

Books

  1. Van Buuren, S. (2018). Flexible Imputation of Missing Data. Second Edition.. Chapman & Hall/CRC. Boca Raton, FL.

Course materials

  1. Handling Missing Data in R with mice
  2. Statistical Methods for combined data sets

Vignettes

  1. Ad hoc methods and the MICE algorithm
  2. Convergence and pooling
  3. Inspecting how the observed data and missingness are related
  4. Passive imputation and post-processing
  5. Imputing multilevel data
  6. Sensitivity analysis with mice
  7. Generate missing values with ampute
  8. futuremice: Wrapper for parallel MICE imputation through futures

Code from publications

  1. Flexible Imputation of Missing Data. Second edition.

Acknowledgement

The cute mice sticker was designed by Jaden M. Walters. Thanks Jaden!

Code of Conduct

Please note that the mice project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('mice')

Monthly Downloads

44,040

Version

3.15.0

License

GPL (>= 2)

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

November 19th, 2022

Functions in mice (3.15.0)

D1

Compare two nested models using D1-statistic
ampute.default.freq

Default freq in ampute
ampute

Generate missing data for simulation purposes
D3

Compare two nested models using D3-statistic
D2

Compare two nested models using D2-statistic
ampute.default.patterns

Default patterns in ampute
ampute.default.type

Default type in ampute()
ampute.default.odds

Default odds in ampute()
ampute.continuous

Multivariate amputation based on continuous probability functions
ampute.default.weights

Default weights in ampute
as.mira

Create a mira object from repeated analyses
boys

Growth of Dutch boys
bwplot.mads

Box-and-whisker plot of amputed and non-amputed data
appendbreak

Appends specified break to the data
anova.mira

Compare several nested models
as.mitml.result

Converts into a mitml.result object
brandsma

Brandsma school data used Snijders and Bosker (2012)
ampute.discrete

Multivariate amputation based on discrete probability functions
as.mids

Converts an imputed dataset (long format) into a mids object
ampute.mcar

Multivariate amputation under a MCAR mechanism
cc

Select complete cases
employee

Employee selection data
complete.mids

Extracts the completed data from a mids object
cbind.mids

Combine mids objects by columns
cci

Complete case indicator
densityplot.mids

Density plot of observed and imputed data
construct.blocks

Construct blocks from formulas and predictorMatrix
convergence

Computes convergence diagnostics for a mids object
cbind

Combine R objects by rows and columns
bwplot.mids

Box-and-whisker plot of observed and imputed data
extend.formulas

Extends formula's with predictor matrix settings
estimice

Computes least squares parameters
fix.coef

Fix coefficients and update model
extractBS

Extract broken stick estimates from a lmer object
extend.formula

Extends a formula with predictors
fdd

SE Fireworks disaster data
flux

Influx and outflux of multivariate missing data patterns
fico

Fraction of incomplete cases among cases with observed
fdgs

Fifth Dutch growth study 2009
getqbar

Extract estimate from mipo object
filter.mids

Subset rows of a mids object
futuremice

Wrapper function that runs MICE in parallel
ici

Incomplete case indicator
fluxplot

Fluxplot of the missing data pattern
getfit

Extract list of fitted models
ifdo

Conditional imputation helper
ic

Select incomplete cases
glm.mids

Generalized linear model for mids object
ibind

Enlarge number of imputations by combining mids objects
glance.mipo

Glance method to extract information from a `mipo` object
is.mids

Check for mids object
is.mipo

Check for mipo object
lm.mids

Linear regression for mids object
is.mads

Check for mads object
mads-class

Multivariate amputed data set (mads)
is.mitml.result

Check for mitml.result object
leiden85

Leiden 85+ study
make.blots

Creates a blots argument
make.blocks

Creates a blocks argument
is.mira

Check for mira object
matchindex

Find index of matched donor units
mcar

Jamshidian and Jalal's Non-Parametric MCAR Test
make.formulas

Creates a formulas argument
make.predictorMatrix

Creates a predictorMatrix argument
make.post

Creates a post argument
make.visitSequence

Creates a visitSequence argument
mammalsleep

Mammal sleep data
md.pairs

Missing data pattern by variable pairs
make.where

Creates a where argument
make.method

Creates a method argument
mice.impute.2lonly.norm

Imputation at level 2 by Bayesian linear regression
mdc

Graphical parameter for missing data plots
md.pattern

Missing data pattern
mice.impute.2l.pan

Imputation by a two-level normal model using pan
mice.impute.2l.lmer

Imputation by a two-level normal model using lmer
mice.impute.2lonly.mean

Imputation of most likely value within the class
mice.impute.2lonly.pmm

Imputation at level 2 by predictive mean matching
mice.impute.2l.norm

Imputation by a two-level normal model
mice

mice: Multivariate Imputation by Chained Equations
mice.impute.2l.bin

Imputation by a two-level logistic model using glmer
mice.impute.logreg.boot

Imputation by logistic regression using the bootstrap
mice.impute.lasso.select.logreg

Imputation by indirect use of lasso logistic regression
mice.impute.logreg

Imputation by logistic regression
mice.impute.cart

Imputation by classification and regression trees
mice.impute.lasso.logreg

Imputation by direct use of lasso logistic regression
mice.impute.mean

Imputation by the mean
mice.impute.lasso.select.norm

Imputation by indirect use of lasso linear regression
mice.impute.jomoImpute

Multivariate multilevel imputation using jomo
mice.impute.lasso.norm

Imputation by direct use of lasso linear regression
mice.impute.lda

Imputation by linear discriminant analysis
mice.impute.mpmm

Imputation by multivariate predictive mean matching
mice.impute.mnar.logreg

Imputation under MNAR mechanism by NARFCS
mice.impute.midastouch

Imputation by predictive mean matching with distance aided donor selection
mice.impute.norm

Imputation by Bayesian linear regression
mice.impute.norm.nob

Imputation by linear regression without parameter uncertainty
mice.impute.pmm

Imputation by predictive mean matching
mice.impute.panImpute

Impute multilevel missing data using pan
mice.impute.passive

Passive imputation
mice.impute.norm.boot

Imputation by linear regression, bootstrap method
mice.impute.norm.predict

Imputation by linear regression through prediction
mice.impute.sample

Imputation by simple random sampling
mids2mplus

Export mids object to Mplus
mice.impute.quadratic

Imputation of quadratic terms
mice.theme

Set the theme for the plotting Trellis functions
mice.mids

Multivariate Imputation by Chained Equations (Iteration Step)
mice.impute.polyreg

Imputation of unordered data by polytomous regression
mice.impute.ri

Imputation by the random indicator method for nonignorable data
mice.impute.polr

Imputation of ordered data by polytomous regression
mids-class

Multiply imputed data set (mids)
mice.impute.rf

Imputation by random forests
name.blocks

Name imputation blocks
name.formulas

Name formula list elements
mids2spss

Export mids object to SPSS
ncc

Number of complete cases
nhanes

NHANES example - all variables numerical
nelsonaalen

Cumulative hazard rate or Nelson-Aalen estimator
mnar_demo_data

MNAR demo data
mira-class

Multiply imputed repeated analyses (mira)
mipo

mipo: Multiple imputation pooled object
nhanes2

NHANES example - mixed numerical and discrete variables
norm.draw

Draws values of beta and sigma by Bayesian linear regression
pool

Combine estimates by pooling rules
nic

Number of incomplete cases
.pmm.match

Finds an imputed value from matches in the predictive metric (deprecated)
nimp

Number of imputations per block
pattern

Datasets with various missing data patterns
plot.mids

Plot the trace lines of the MICE algorithm
print.mids

Print a mids object
quickpred

Quick selection of predictors from the data
popmis

Hox pupil popularity data with missing popularity scores
selfreport

Self-reported and measured BMI
potthoffroy

Potthoff-Roy data
pool.scalar

Multiple imputation pooling: univariate version
print.mads

Print a mads object
parlmice

Wrapper function that runs MICE in parallel
reexports

Objects exported from other packages
pool.r.squared

Pools R^2 of m models fitted to multiply-imputed data
pops

Project on preterm and small for gestational age infants (POPS)
rbind.mids

Combine mids objects by rows
pool.compare

Compare two nested models fitted to imputed data
squeeze

Squeeze the imputed values to be within specified boundaries.
toenail

Toenail data
toenail2

Toenail data
version

Echoes the package version number
walking

Walking disability data
stripplot.mids

Stripplot of observed and imputed data
tbc

Terneuzen birth cohort
summary.mira

Summary of a mira object
tidy.mipo

Tidy method to extract results from a `mipo` object
supports.transparent

Supports semi-transparent foreground colors?
windspeed

Subset of Irish wind speed data
with.mids

Evaluate an expression in multiple imputed datasets
xyplot.mads

Scatterplot of amputed and non-amputed data against weighted sum scores
xyplot.mids

Scatterplot of observed and imputed data