Imputes univariate missing data using fast predictive mean matching
mice.impute.fastpmm(y, ry, x, donors = 5, type = 1, ridge = 1e-05,
version = "", ...)
Numeric vector with incomplete data
Response pattern of y
(TRUE
=observed,
FALSE
=missing)
Design matrix with length(y)
rows and p
columns
containing complete covariates.
The size of the donor pool among which a draw is made. The default is
donors = 5
. Setting donors = 1
always selects the closest match. Values
between 3 and 10 provide the best results. Note: The default was changed from
3 to 5 in version 2.19, based on simulation work by Tim Morris.
Type of matching distance. The default choice type = 1
calculates the distance between the predicted value of yobs
and the drawn values of ymis
. Other choices are type = 0
(distance between predicted values) and type = 2
(distance between drawn values). The current version supports only type = 1
.
The ridge penalty applied in .norm.draw()
to prevent problems with multicollinearity. The default is ridge = 1e-05
, which means that 0.01 percent of the diagonal is added to the cross-product. Larger ridges may result in more biased estimates. For highly noisy data (e.g. many junk variables), set ridge = 1e-06
or even lower to reduce bias. For highly collinear data, set ridge = 1e-04
or higher.
A character variable indicating the version. Currently unused.
Other named arguments.
Numeric vector of length sum(!ry)
with imputations
Imputation of y
by predictive mean matching, based on Rubin (1987, p.
168, formulas a and b). The procedure is as follows:
Estimate beta and sigma by linear regression
Draw beta* and sigma* from the proper posterior
Compute predicted values for yobs
beta
and
ymis
beta*
For each ymis
, find donors
observations with
closest predicted values, randomly sample one of these,
and take its observed value in y
as the imputation.
Ties are broken by making a random draw
among ties.
Note: The matching is done on predicted y
, NOT on
observed y
.
Little, R.J.A. (1988), Missing data adjustments in large surveys (with discussion), Journal of Business Economics and Statistics, 6, 287--301.
Rubin, D.B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.
Van Buuren, S., Brand, J.P.L., Groothuis-Oudshoorn C.G.M., Rubin, D.B. (2006) Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76, 12, 1049--1064.
Van Buuren, S., Groothuis-Oudshoorn, K. (2011). mice
: Multivariate
Imputation by Chained Equations in R
. Journal of Statistical
Software, 45(3), 1-67. http://www.jstatsoft.org/v45/i03/