mice.impute.norm.predict: Imputation by linear regression through prediction

Description

Imputes the "best value" according to the linear regression model, also known as regression imputation.

Usage

mice.impute.norm.predict(y, ry, x, wy = NULL, ridge = 1e-05, ...)

Arguments

Vector to be imputed

Logical vector of length length(y) indicating the the subset y[ry] of elements in y to which the imputation model is fitted. The ry generally distinguishes the observed (TRUE) and missing values (FALSE) in y.

Numeric design matrix with length(y) rows with predictors for y. Matrix x may have no missing values.

Logical vector of length length(y). A TRUE value indicates locations in y for which imputations are created.

ridge

The ridge penalty used in .norm.draw() to prevent problems with multicollinearity. The default is ridge = 1e-05, which means that 0.01 percent of the diagonal is added to the cross-product. Larger ridges may result in more biased estimates. For highly noisy data (e.g. many junk variables), set ridge = 1e-06 or even lower to reduce bias. For highly collinear data, set ridge = 1e-04 or higher.

...

Other named arguments.

Value

Vector with imputed data, same type as y, and of length sum(wy)

Warning

THIS METHOD SHOULD NOT BE USED FOR DATA ANALYSIS. This method is seductive because it imputes the most likely value according to the model. However, it ignores the uncertainty of the missing values and artificially amplifies the relations between the columns of the data. Application of richer models having more parameters does not help to evade these issues. Stochastic regression methods, like mice.impute.pmm or mice.impute.norm, are generally preferred.

At best, prediction can give reasonble estimates of the mean, especially if normality assumptions are plausble. See Little and Rubin (2002, p. 62-64) or Van Buuren (2012, p. 11-13, p. 45-46) for a discussion of this method.

Details

Calculates regression weights from the observed data and returns predicted values to as imputations. The ridge parameter adds a penalty term ridge*diag(xtx) to the variance-covariance matrix xtx. This method is known as regression imputation.

References

Little, R.J.A. and Rubin, D.B. (2002). Statistical Analysis with Missing Data. New York: John Wiley and Sons.

Van Buuren, S. (2012). Flexible Imputation of Missing Data. CRC/Chapman & Hall, FL: Boca Raton.