Learn R Programming

missMethods (version 0.2.0)

impute_in_classes: Impute in classes

Description

Apply an imputation function inside imputation classes

Usage

impute_in_classes(
  ds,
  cols_class,
  FUN,
  breaks = Inf,
  use_quantiles = FALSE,
  min_objs_in_class = 1,
  min_obs_comp = 0,
  min_obs_per_col = 1,
  donor_limit = Inf,
  dl_type = "cols_seq",
  add_imputation_classes = FALSE,
  ...
)

Arguments

ds

A data frame or matrix with missing values.

cols_class

Columns that are used for constructing the imputation classes.

FUN

An imputation function that is applied to impute the missing values.

breaks

Number of intervals / levels a column is broken into (see cut(), which is used internally for cutting numeric columns). If breaks = Inf (the default), every unique value of a column can be in a separate class (if no other restrictions apply).

use_quantiles

Should quantiles be used for cutting numeric vectors? Normally, cut() divides the range of an vector into equal spaced intervals. If use_quantiles = TRUE, the classes will be of roughly equal content.

min_objs_in_class

Minimum number of objects (rows) in an imputation class.

min_obs_comp

Minimum number of completely observed objects (rows) in an imputation class.

min_obs_per_col

Minimum number of observed values in every column of an imputation class.

donor_limit

Minimum odds between incomplete and complete values in a column, if dl_type = cols_seq. If dl_type = sim_comp, minimum odds between incomplete and complete rows.

dl_type

See donor_limit.

add_imputation_classes

Should imputation classes be added as attributes to the imputed dataset?

...

Arguments passed to FUN.

Value

An object of the same class as ds with imputed missing values.

Details

Imputation classes (sometimes also called adjustment cells) are build using cross-validation of all cols_class. The classes are collapsed, if they do not satisfy all of the criteria defined by min_objs_in_class, min_obs_comp, min_obs_per_col and donor_limit. Collapsing starts from the last value of cols_class. Internally, a mixture of collapsing and early stopping is used for the construction of the classes.

References

Andridge, R.R. and Little, R.J.A. (2010), A Review of Hot Deck Imputation for Survey Non-response. International Statistical Review, 78: 40-64. doi:10.1111/j.1751-5823.2010.00103.x

Examples

Run this code
# NOT RUN {
# Mean imputation in classes
impute_in_classes(data.frame(X = 1:5, Y = c(NA, 12:15)), "X",
  impute_mean,
  min_obs_per_col = 2
)
# }

Run the code above in your browser using DataLab