BBPMM(Data, M=10, nIter=10, outfile=NULL, ignore=NULL,
vartype=NULL, stepmod="stepAIC", maxit.multi=3, maxit.glm=25,
maxPerc = 0.98, verbose=TRUE, setSeed, chainDiagnostics=TRUE, ...)
dmi
). Default=10.BBPMM
..Random.Seed
before imputation starts..Random.Seed
after function is done.BBPMM
is based on a chained equations approach
that is using a Bayesian Bootstrap approach and Predictive Mean
Matching (PMM) variants for metric-scale, binary, and multi-categorical
variables to generate multiple imputations. In order to emulate a
monotone missing-data pattern as well as possible, variables are sorted
by rate of missingness (in ascending order). If no complete variables
exist, the least incomplete variable is imputed via hot-deck. The
starting solution then builds the imputation model using the observed values of
a particular y variable, and the corresponding observed or already
imputed values of the x variables (i.e., all variables with fewer
missing values than y).
Due to the PMM element in the algorithm,
auto-correlation of subsequent iterations is virtually zero. Therefore, a
burn-in period is not required, and there is no need to administer
high values (> 20) to nIter either.
If M=1, no Bayesian Bootstrap step is carried
out for the chained equations. Note that in this case the algorithm is still unlikely to
converge to a stable solution, because of the Predictive Mean Matching
step.
Koller-Meinfelder, F. (2009) Analysis of Incomplete Survey Data -- Multiple Imputation Via Bayesian Bootstrap Predictive Mean Matching, doctoral thesis.
Little, R.J.A. (1988) Missing-Data Adjustments in Large Surveys, Journal of Business and Economic Statistics, Vol. 6, No. 3, pp. 287-296.
Raghunathan T.E. and Lepkowski, J.M. and Van Hoewyk, J. and Solenberger, P (2001) A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology, Vol. 27, pp. 85--95.
Rubin DB (1981) The Bayesian Bootstrap. The Annals of Statistics, Vol. 9, pp. 130--134.
Rubin, D.B. (1987) Multiple Imputation for Non-Response in Surveys. New York: John Wiley & Sons, Inc.
Van Buuren, S. and Brand, J.P.L. and Groothuis-Oudshoorn, C.G.M. and Rubin, D.B. (2006) Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, Vol. 76, No. 12, pp. 1049--1064.
Van Buuren, S. and Groothuis-Oudshoorn, K. (2011) mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, Vol. 45, No. 3, pp. 1--67. URL http://www.jstatsoft.org/v45/i03/.
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth Edition. New York: Springer.
BBPMM.row
, dmi
### sample data set with non-normal variables
set.seed(1000)
n <- 50
x1 <- round(runif(n,0.5,3.5))
x2 <- as.factor(c(rep(1,10),rep(2,25),rep(3,15)))
x3 <- round(rnorm(n,0,3))
y1 <- round(x1-0.25*(x2==2)+0.5*x3+rnorm(n,0,1))
y1 <- ifelse(y1<1,1,y1)
y1 <- as.factor(ifelse(y1>4,5,y1))
y2 <- x1+rnorm(n,0,0.5)
y3 <- round(x3+rnorm(n,0,2))
data1 <- as.data.frame(cbind(x1,x2,x3,y1,y2,y3))
misrow1 <- sample(n,20)
misrow2 <- sample(n,15)
misrow3 <- sample(n,10)
is.na(data1[misrow1, 4]) <- TRUE
is.na(data1[misrow2, 5]) <- TRUE
is.na(data1[misrow2, 6]) <- TRUE
### imputation
imputed.data <- BBPMM(data1, nIter=5, M=5)
Run the code above in your browser using DataLab