Apply a function for imputation over rows, columns or combinations of both
apply_imputation(
ds,
FUN = mean,
type = "columnwise",
convert_tibble = TRUE,
...
)
An object of the same class as ds
with imputed missing values.
A data frame or matrix with missing values.
The function to be applied for imputation.
A string specifying the values used for imputation; one of: "columnwise", "rowwise", "total", "Two-Way" or "Winer" (see details).
If ds
is a tibble, should it be converted
(see section A note for tibble users).
Further arguments passed to FUN
.
If you use tibbles and convert_tibble
is TRUE
the tibble is
first converted to a data frame, then imputed and converted back. If
convert_tibble
is FALSE
no conversion is done. However,
depending on the tibble and the package version of tibble you use,
imputation may not be possible and some errors will be thrown.
The functionality of apply_imputation
is inspired by the
apply
function. The function applies a function
FUN
to impute the missing values in ds
. FUN
must be a
function, which takes a vector as input and returns exactly one value. The
argument type
is comparable to apply
's
MARGIN
argument. It specifies the values that are used for the
calculation of the imputation values. For example, type = "columnwise"
and FUN = mean
will impute the mean of the observed values in a column
for all missing values in this column. In contrast, type = "rowwise"
and FUN = mean
will impute the mean of the observed values in a row
for all missing values in this row.
List of all implemented types
:
"columnwise" (the default): imputes column by column; all observed
values of a column are given to FUN
and the returned value is used
as the imputation value for all missing values of the column.
"rowwise": imputes row by row; all observed values of a row are given
to FUN
and the returned value is used as the imputation value for all
missing values of the row.
"total": All observed values of ds
are given to FUN
and
the returned value is used as the imputation value for all missing values of
ds
.
"Winer": The mean value from "columnwise" and "rowwise" is used as the imputation value.
"Two-Way": The sum of the values from "columnwise" and "rowwise" minus "total" is used as the imputation value.
If no value can be given to FUN
(for example, if no value in a column
is observed and type = "columnwise"
), then a warning will be issued
and no value will be imputed in the corresponding column or row.
Beland, S., Pichette, F., & Jolani, S. (2016). Impact on Cronbach's \(\alpha\) of simple treatment methods for missing data. The Quantitative Methods for Psychology, 12(1), 57-73.
A convenient interface exists for common cases like mean imputation:
impute_mean
, impute_median
,
impute_mode
. All these functions
call apply_imputation
.
ds <- data.frame(X = 1:20, Y = 101:120)
ds_mis <- delete_MCAR(ds, 0.2)
ds_imp_app <- apply_imputation(ds_mis, FUN = mean, type = "total")
# the same result can be achieved via impute_mean():
ds_imp_mean <- impute_mean(ds_mis, type = "total")
all.equal(ds_imp_app, ds_imp_mean)
Run the code above in your browser using DataLab