Learn R Programming

missMethods (version 0.4.0)

evaluate_imputed_values: Evaluate imputed values

Description

Compare imputed to true values

Usage

evaluate_imputed_values(
  ds_imp,
  ds_orig,
  criterion = "RMSE",
  M = NULL,
  cols_which = seq_len(ncol(ds_imp)),
  tolerance = sqrt(.Machine$double.eps),
  imp_ds,
  orig_ds,
  which_cols
)

Value

A numeric vector of length one.

Arguments

ds_imp

A data frame or matrix with imputed values.

ds_orig

A data frame or matrix with original (true) values.

criterion

A string specifying the used criterion for comparing the imputed and original values.

M

NULL (the default) or a missing data indicator matrix. The missing data indicator matrix is normally created via is.na(ds_mis), where ds_mis is the dataset after deleting values from ds_orig.

cols_which

Indices or names of columns used for evaluation.

tolerance

Numeric, only used for criterion = "precision": numeric differences smaller than tolerance are treated as zero/equal.

imp_ds

Deprecated, renamed to ds_imp.

orig_ds

Deprecated, renamed to ds_orig.

which_cols

Deprecated, renamed to cols_which.

Details

The following criterions are implemented to compare the imputed values to the true values:

  • "RMSE" (the default): The Root Mean Squared Error between the imputed and true values

  • "bias": The mean difference between the imputed and the true values

  • "cor": The correlation between the imputed and true values

  • "MAE": The Mean Absolute Error between the imputed and true values

  • "MSE": The Mean Squared Error between the imputed and true values

  • "NRMSE_col_mean": For every column the RMSE divided by the mean of the true values is calculated. Then these columnwise values are squared and averaged. Finally, the square root of this average is returned.

  • "NRMSE_col_mean_sq": For every column the RMSE divided by the square root of the mean of the squared true values is calculated. Then these columnwise values are squared and averaged. Finally, the square root of this average is returned.

  • "NRMSE_col_sd": For every column the RMSE divided by the standard deviation of all true values is calculated. Then these columnwise values are squared and averaged. Finally, the square root of this average is returned.

  • "NRMSE_tot_mean": RMSE divided by the mean of all true values

  • "NRMSE_tot_mean_sq": RMSE divided by the square root of the mean of all squared true values

  • "NRMSE_tot_sd": RMSE divided by the standard deviation of all true values

  • "nr_equal": number of imputed values that are equal to the true values

  • "nr_NA": number of values in ds_imp that are NA (not imputed)

  • "precision": proportion of imputed values that are equal to the true values

Additionally there are relative versions of bias and MAE implemented. In the relative versions, the differences are divided by the absolute values of the true values. These relative versions can be selected via "bias_rel" and "MAE_rel". The "NRMSE_tot_" and "NRMSE_col_" are equal, if the columnwise normalization values are equal to the total normalization value (see examples).

The argument cols_which allows the selection of columns for comparison (see examples).

If M = NULL (the default), then all values of ds_imp and ds_orig will be used for the calculation of the evaluation criterion. If a missing data indicator matrix is given via M, only the truly imputed values (values that are marked as missing via M) will be used for the calculation. If you want to provide M, M must be a logical matrix of the same dimensions as ds_orig and missing values must be coded as TRUE. This is the standard behavior, if you use is.na on a dataset with missing values to generate M (see examples). It is possible to combine M and cols_which.

References

Kim, H., Golub, G. H., & Park, H. (2005). Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinformatics, 21(2), 187-198.

See Also

Other evaluation functions: evaluate_imputation_parameters(), evaluate_parameters()

Examples

Run this code
ds_orig <- data.frame(X = 1:10, Y = 101:110)
ds_mis <- delete_MCAR(ds_orig, 0.3)
ds_imp <- impute_mean(ds_mis)
# compare all values from ds_orig and ds_imp
evaluate_imputed_values(ds_imp, ds_orig)
# compare only the imputed values
M <- is.na(ds_mis)
evaluate_imputed_values(ds_imp, ds_orig, M = M)
# compare only the imputed values in column X
evaluate_imputed_values(ds_imp, ds_orig, M = M, cols_which = "X")

# NRMSE_tot_mean and NRMSE_col_mean are equal, if columnwise means are equal
ds_orig <- data.frame(X = 1:10, Y = 10:1)
ds_mis <- delete_MCAR(ds_orig, 0.3)
ds_imp <- impute_mean(ds_mis)
evaluate_imputed_values(ds_imp, ds_orig, "NRMSE_tot_mean")
evaluate_imputed_values(ds_imp, ds_orig, "NRMSE_col_mean")

Run the code above in your browser using DataLab