Impute missing values in a data frame or a matrix using a simple random hot deck
impute_sRHD(ds, type = "cols_seq", donor_limit = Inf)
A data frame or matrix with missing values.
The type of hot deck; the default ("cols_seq") is a random hot deck that imputes each column separately. Other choices are "sim_comp" and "sim_part". Both impute all missing values in an object (row) simultaneously using a single donor object. The difference between the two types is the choice of objects that can act as donors. "sim_comp:" only completely observed objects can be donors. "sim_part": all objects that have no missing values in the missing parts of a recipient can be donors.
Numeric of length one or "min"; how many times an object
can be a donor. default is Inf
(no restriction).
An object of the same class as ds
with imputed missing values.
There are three types of simple random hot decks implemented. They can be
selected via type
:
"cols_seq" (the default): Each variable (column) is handled separately. If an object (row) has a missing value in a variable (column), then one of the observed values in the same variable is chosen randomly and the missing value is replaced with this chosen value. This is done for all missing values.
"sim_comp": All missing variables (columns) of an object are imputed together ("simultaneous"). For every object with missing values (such an object is called a recipient in hot deck terms), one complete object is chosen randomly and all missing values of the recipient are imputed with the values from the complete object. A complete object used for imputation is called a donor.
"sim_part": All missing variables (columns) of an object are imputed together ("simultaneous"). For every object with missing values (recipient) one donor is chosen. The donor must have observed values in all the variables that are missing in the recipient. The donor is allowed to have unobserved values in the non-missing parts of the recipient. So, in contrast to "sim_comp", the donor can be partly incomplete.
The parameter donor_limit
controls how often an object can be a donor.
This parameter is only implemented for types "cols_seq" and "sim_comp". If
type = "sim_part"
and donor_limit
is not Inf
, then an
error will be thrown. For "sim_comp" the default value (Inf
) allows
every object to be a donor for an infinite number of times (there is no
restriction on the times an object can be a donor). If a numeric value less
than Inf
is chosen, then every object can be a donor at most
donor_limit
times. For example donor_limit = 1
ensures that
every object donates at most one time. If there are only few complete objects
and donor_limit
is set too low, then an imputation might not be
possible with the chosen donor_limit
. In this case, the
donor_limit
will be adjusted (see examples). Setting donor_limit
= "min"
chooses automatically the minimum value for donor_limit
that
allows imputation of all missing values. For type = "cols_seq"
the
donor limit is applied for every column separately.
Andridge, R. R., & Little, R. J. (2010). A review of hot deck imputation for survey non-response. International statistical review, 78(1), 40-64.
# NOT RUN {
ds <- data.frame(X = 1:20, Y = 101:120)
ds_mis <- delete_MCAR(ds, 0.2)
ds_imp <- impute_sRHD(ds_mis)
# }
# NOT RUN {
# Warning: donor limit to low
ds_mis_one_donor <- ds
ds_mis_one_donor[1:19, "X"] <- NA
impute_sRHD(ds_mis_one_donor, donor_limit = 3)
# }
Run the code above in your browser using DataLab