Learn R Programming

mice (version 3.4.0)

parlmice: Wrapper function that runs MICE in parallel

Description

This is a wrapper function for mice, using multiple cores to execute mice in parallel. As a result, the imputation procedure can be sped up, which may be useful in general.

Usage

parlmice(data, m = 5, seed = NA, cluster.seed = NA, n.core = NULL,
  n.imp.core = NULL, cl.type = "PSOCK", ...)

Arguments

data

A data frame or matrix containing the incomplete data. Similar to the first argument of mice.

m

The number of desired imputated datasets. By default $m=5$ as with mice

seed

A scalar to be used as the seed value for the mice algorithm within each parallel stream. Please note that the imputations will be the same for all streams and, hence, this should be used if and only if n.core = 1 and if it is desired to obtain the same output as under mice.

cluster.seed

A scalar to be used as the seed value. It is recommended to put the seed value here and not outside this function, as otherwise the parallel processes will be performed with separate, random seeds.

n.core

A scalar indicating the number of cores that should be used.

n.imp.core

A scalar indicating the number of imputations per core.

cl.type

The cluster type. Default value is "PSOCK". Posix machines (linux, Mac) generally benefit from much faster cluster computation if type is set to type = "FORK".

...

Named arguments that are passed down to function mice or makeCluster.

Value

A mids object as defined by mids-class

Details

This function relies on package parallel, which is a base package for R versions 2.14.0 and later. We have chosen to use parallel function parLapply to allow the use of parlmice on Mac, Linux and Windows systems. For the same reason, we use the Parallel Socket Cluster (PSOCK) type by default.

On systems other than Windows, it can be hugely beneficial to change the cluster type to FORK, as it generally results in improved memory handling. When memory issues arise on a Windows system, we advise to store the multiply imputed datasets, clean the memory by using rm and gc and make another run using the same settings.

This wrapper function combines the output of parLapply with function ibind in mice. A mids object is returned and can be used for further analyses.

Note that if a seed value is desired, the seed should be entered to this function with argument seed. Seed values outside the wrapper function (in an R-script or passed to mice) will not result to reproducible results. We refer to the manual of parallel for an explanation on this matter.

References

Schouten, R. and Vink, G. (2017). parlmice: faster, paraleller, micer. https://gerkovink.github.io/parlMICE/Vignette_parlMICE.html

#'Van Buuren, S. (2018). Flexible Imputation of Missing Data. Second Edition. Chapman & Hall/CRC. Boca Raton, FL.

See Also

parallel, parLapply, makeCluster, mice, mids-class

Examples

Run this code
# NOT RUN {
# 150 imputations in dataset nhanes, performed by 3 cores  
# }
# NOT RUN {
imp1 <- parlmice(data = nhanes, n.core = 3, n.imp.core = 50)
# Making use of arguments in mice. 
imp2 <- parlmice(data = nhanes, method = "norm.nob", m = 100)
imp2$method
fit <- with(imp2, lm(bmi ~ hyp))
pool(fit)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab