Learn R Programming

missMethods (version 0.2.0)

impute_sRHD: Simple random hot deck imputation

Description

Impute missing values in a data frame or a matrix using a simple random hot deck

Usage

impute_sRHD(ds, type = "cols_seq", donor_limit = Inf)

Arguments

ds

A data frame or matrix with missing values.

type

The type of hot deck; the default ("cols_seq") is a random hot deck that imputes each column separately. Other choices are "sim_comp" and "sim_part". Both impute all missing values in an object (row) simultaneously using a single donor object. The difference between the two types is the choice of objects that can act as donors. "sim_comp:" only completely observed objects can be donors. "sim_part": all objects that have no missing values in the missing parts of a recipient can be donors.

donor_limit

Numeric of length one or "min"; how many times an object can be a donor. default is Inf (no restriction).

Value

An object of the same class as ds with imputed missing values.

Details

There are three types of simple random hot decks implemented. They can be selected via type:

  • "cols_seq" (the default): Each variable (column) is handled separately. If an object (row) has a missing value in a variable (column), then one of the observed values in the same variable is chosen randomly and the missing value is replaced with this chosen value. This is done for all missing values.

  • "sim_comp": All missing variables (columns) of an object are imputed together ("simultaneous"). For every object with missing values (such an object is called a recipient in hot deck terms), one complete object is chosen randomly and all missing values of the recipient are imputed with the values from the complete object. A complete object used for imputation is called a donor.

  • "sim_part": All missing variables (columns) of an object are imputed together ("simultaneous"). For every object with missing values (recipient) one donor is chosen. The donor must have observed values in all the variables that are missing in the recipient. The donor is allowed to have unobserved values in the non-missing parts of the recipient. So, in contrast to "sim_comp", the donor can be partly incomplete.

The parameter donor_limit controls how often an object can be a donor. This parameter is only implemented for types "cols_seq" and "sim_comp". If type = "sim_part" and donor_limit is not Inf, then an error will be thrown. For "sim_comp" the default value (Inf) allows every object to be a donor for an infinite number of times (there is no restriction on the times an object can be a donor). If a numeric value less than Inf is chosen, then every object can be a donor at most donor_limit times. For example donor_limit = 1 ensures that every object donates at most one time. If there are only few complete objects and donor_limit is set too low, then an imputation might not be possible with the chosen donor_limit. In this case, the donor_limit will be adjusted (see examples). Setting donor_limit = "min" chooses automatically the minimum value for donor_limit that allows imputation of all missing values. For type = "cols_seq" the donor limit is applied for every column separately.

References

Andridge, R. R., & Little, R. J. (2010). A review of hot deck imputation for survey non-response. International statistical review, 78(1), 40-64.

Examples

Run this code
# NOT RUN {
ds <- data.frame(X = 1:20, Y = 101:120)
ds_mis <- delete_MCAR(ds, 0.2)
ds_imp <- impute_sRHD(ds_mis)
# }
# NOT RUN {
# Warning: donor limit to low
ds_mis_one_donor <- ds
ds_mis_one_donor[1:19, "X"] <- NA
impute_sRHD(ds_mis_one_donor, donor_limit = 3)
# }

Run the code above in your browser using DataLab