Learn R Programming

missMethods (version 0.4.0)

impute_in_classes: Impute in classes

Description

Apply an imputation function inside imputation classes

Usage

impute_in_classes(
  ds,
  cols_class,
  FUN,
  breaks = Inf,
  use_quantiles = FALSE,
  min_objs_in_class = 1,
  min_obs_comp = 0,
  min_obs_per_col = 1,
  donor_limit = Inf,
  dl_type = "cols_seq",
  add_imputation_classes = FALSE,
  ...
)

Value

An object of the same class as ds with imputed missing values.

Arguments

ds

A data frame or matrix with missing values.

cols_class

Columns that are used for constructing the imputation classes.

FUN

An imputation function that is applied to impute the missing values.

breaks

Number of intervals / levels a column is broken into (see cut(), which is used internally for cutting numeric columns). If breaks = Inf (the default), every unique value of a column can be in a separate class (if no other restrictions apply).

use_quantiles

Should quantiles be used for cutting numeric vectors? Normally, cut() divides the range of an vector into equal spaced intervals. If use_quantiles = TRUE, the classes will be of roughly equal content.

min_objs_in_class

Minimum number of objects (rows) in an imputation class.

min_obs_comp

Minimum number of completely observed objects (rows) in an imputation class.

min_obs_per_col

Minimum number of observed values in every column of an imputation class.

donor_limit

Minimum odds between incomplete and complete values in a column, if dl_type = cols_seq. If dl_type = sim_comp, minimum odds between incomplete and complete rows.

dl_type

See donor_limit.

add_imputation_classes

Should imputation classes be added as attributes to the imputed dataset?

...

Arguments passed to FUN.

Details

Imputation classes (sometimes also called adjustment cells) are build using cross-validation of all cols_class. The classes are collapsed, if they do not satisfy all of the criteria defined by min_objs_in_class, min_obs_comp, min_obs_per_col and donor_limit. Collapsing starts from the last value of cols_class. Internally, a mixture of collapsing and early stopping is used for the construction of the classes.

References

Andridge, R.R. and Little, R.J.A. (2010), A Review of Hot Deck Imputation for Survey Non-response. International Statistical Review, 78: 40-64. doi:10.1111/j.1751-5823.2010.00103.x

Examples

Run this code
# Mean imputation in classes
impute_in_classes(data.frame(X = 1:5, Y = c(NA, 12:15)), "X",
  impute_mean,
  min_obs_per_col = 2
)

Run the code above in your browser using DataLab