Apply an imputation function inside imputation classes
impute_in_classes(
ds,
cols_class,
FUN,
breaks = Inf,
use_quantiles = FALSE,
min_objs_in_class = 1,
min_obs_comp = 0,
min_obs_per_col = 1,
donor_limit = Inf,
dl_type = "cols_seq",
add_imputation_classes = FALSE,
...
)
A data frame or matrix with missing values.
Columns that are used for constructing the imputation classes.
An imputation function that is applied to impute the missing values.
Number of intervals / levels a column is broken into (see
cut()
, which is used internally for cutting numeric columns). If breaks = Inf
(the default), every unique value of a column can be in a separate
class (if no other restrictions apply).
Should quantiles be used for cutting numeric vectors?
Normally, cut()
divides the range of an vector into equal spaced
intervals. If use_quantiles = TRUE
, the classes will be of roughly equal
content.
Minimum number of objects (rows) in an imputation class.
Minimum number of completely observed objects (rows) in an imputation class.
Minimum number of observed values in every column of an imputation class.
Minimum odds between incomplete and complete values in a
column, if dl_type = cols_seq
. If dl_type = sim_comp
, minimum odds
between incomplete and complete rows.
See donor_limit
.
Should imputation classes be added as attributes to the imputed dataset?
Arguments passed to FUN
.
An object of the same class as ds
with imputed missing values.
Imputation classes (sometimes also called adjustment cells) are
build using cross-validation of all cols_class
. The classes are
collapsed, if they do not satisfy all of the criteria defined by
min_objs_in_class
, min_obs_comp
, min_obs_per_col
and donor_limit
.
Collapsing starts from the last value of cols_class
. Internally, a mixture
of collapsing and early stopping is used for the construction of the
classes.
Andridge, R.R. and Little, R.J.A. (2010), A Review of Hot Deck Imputation for Survey Non-response. International Statistical Review, 78: 40-64. doi:10.1111/j.1751-5823.2010.00103.x
# NOT RUN {
# Mean imputation in classes
impute_in_classes(data.frame(X = 1:5, Y = c(NA, 12:15)), "X",
impute_mean,
min_obs_per_col = 2
)
# }
Run the code above in your browser using DataLab