Learn R Programming

imputeMissings (version 0.0.1)

impute: Impute missing values with the median/mode or randomForest

Description

When the median/mode method is used: character vectors and factors are imputed with the mode. Numeric and integer vectors are imputed with the median. When the random forest method is used predictors are first imputed with the mean/median and each variable is then predicted and imputed with that value. For predictive contexts there is a compute and an impute function. The former is used on a training set to learn the values (or random forest models) to impute (used to predict). The latter is used on both the training and new data to impute the values (or deploy the models) learned by the compute function.

Usage

impute(data, object = NULL, method = "median/mode")

Arguments

data
A data frame with dummies or numeric variables. Categorical variables (i.e., non-dummy / indicator) variables work up to a certain number but are not recommended.
object
If NULL impute will call compute on the current dataset. Otherwise it will accept the output of a call to compute
method
Either "median/mode" or "randomForest". Only works if object = NULL

Value

An imputed data frame.

See Also

compute

Examples

Run this code
#Example:
#create some data
data <- data.frame(V1=as.factor(c('yes','no','no',NA,'yes','yes','yes')),
                  V2=as.character(c(1,2,3,4,4,4,NA)),
                  V3=c(1:6,NA),V4=as.numeric(c(1:6,NA)))
#demonstrate function
object <- compute(data,method="randomForest")
object <- compute(data,method="median/mode")

impute(data)
impute(data,object=compute(data, method="randomForest"))
impute(data,method="randomForest")

Run the code above in your browser using DataLab