Learn R Programming

mice (version 3.15.0)

futuremice: Wrapper function that runs MICE in parallel

Description

This is a wrapper function for mice, using multiple cores to execute mice in parallel. As a result, the imputation procedure can be sped up, which may be useful in general. By default, futuremice distributes the number of imputations m about equally over the cores.

Usage

futuremice(
  data,
  m = 5,
  parallelseed = NA,
  n.core = NULL,
  seed = NA,
  use.logical = TRUE,
  future.plan = "multisession",
  ...
)

Value

A mids object as defined by mids-class

Arguments

data

A data frame or matrix containing the incomplete data. Similar to the first argument of mice.

m

The number of desired imputated datasets. By default $m=5$ as with mice

parallelseed

A scalar to be used to obtain reproducible results over the futures. The default parallelseed = NA will result in a seed value that is randomly drawn between -999999999 and 999999999.

n.core

A scalar indicating the number of cores that should be used.

seed

A scalar to be used as the seed value for the mice algorithm within each parallel stream. Please note that the imputations will be the same for all streams and, hence, this should be used if and only if n.core = 1 and if it is desired to obtain the same output as under mice.

use.logical

A logical indicating whether logical (TRUE) or physical (FALSE) CPU's on machine should be used.

future.plan

A character indicating how futures are resolved. The default multisession resolves futures asynchronously (in parallel) in separate R sessions running in the background. See plan for more information on future plans.

...

Named arguments that are passed down to function mice.

Author

Thom Benjamin Volker, Gerko Vink

Details

This function relies on package furrr, which is a package for R versions 3.2.0 and later. We have chosen to use furrr function future_map to allow the use of futuremice on Mac, Linux and Windows systems.

This wrapper function combines the output of future_map with function ibind from the mice package. A mids object is returned and can be used for further analyses.

A seed value can be specified in the global environment, which will yield reproducible results. A seed value can also be specified within the futuremice call, through specifying the argument parallelseed. If parallelseed is not specified, a seed value is drawn randomly by default, and accessible through $parallelseed in the output object. Hence, results will always be reproducible, regardless of whether the seed is specified in the global environment, or by setting the same seed within the function (potentially by extracting the seed from the futuremice output object.

References

Volker, T.B. and Vink, G. (2022). futuremice: The future starts today. https://www.gerkovink.com/miceVignettes/futuremice/Vignette_futuremice.html

#'Van Buuren, S. (2018). Flexible Imputation of Missing Data. Second Edition. Chapman & Hall/CRC. Boca Raton, FL.

See Also

future, furrr, future_map, plan, mice, mids-class

Examples

Run this code
# 150 imputations in dataset nhanes, performed by 3 cores
if (FALSE) {
imp1 <- futuremice(data = nhanes, m = 150, n.core = 3)
# Making use of arguments in mice.
imp2 <- futuremice(data = nhanes, m = 100, method = "norm.nob")
imp2$method
fit <- with(imp2, lm(bmi ~ hyp))
pool(fit)
}

Run the code above in your browser using DataLab