Learn R Programming

dlookr (version 0.5.0)

imputate_outlier: Impute Outliers

Description

Outliers are imputed with some representative values and statistical methods.

Usage

imputate_outlier(.data, xvar, method, no_attrs)

Arguments

.data

a data.frame or a tbl_df.

xvar

variable name to replace missing value.

method

method of missing values imputation.

no_attrs

logical. If TRUE, return numerical variable or categorical variable. else If FALSE, imputation class.

Value

An object of imputation class. or numerical variable. if no_attrs is FALSE then return imputation class, else no_attrs is TRUE then return numerical vector. Attributes of imputation class is as follows.

  • method : method of missing value imputation.

    • predictor is numerical variable

      • "mean" : arithmetic mean

      • "median" : median

      • "mode" : mode

      • "capping" : Impute the upper outliers with 95 percentile, and Impute the bottom outliers with 5 percentile.

  • outlier_pos : position of outliers in predictor.

  • outliers : outliers. outliers corresponding to outlier_pos.

  • type : "outliers". type of imputation.

Details

imputate_outlier() creates an imputation class. The `imputation` class includes missing value position, imputed value, and method of missing value imputation, etc. The `imputation` class compares the imputed value with the original value to help determine whether the imputed value is used in the analysis.

See vignette("transformation") for an introduction to these concepts.

See Also

imputate_na.

Examples

Run this code
# NOT RUN {
# Replace the outliers of the sodium variable with median.
imputate_outlier(heartfailure, sodium, method = "median")

# Replace the outliers of the sodium variable with capping.
imputate_outlier(heartfailure, sodium, method = "capping")

## using dplyr -------------------------------------
library(dplyr)

# The mean before and after the imputation of the sodium variable
heartfailure %>%
  mutate(sodium_imp = imputate_outlier(heartfailure, sodium, 
                                      method = "capping", no_attrs = TRUE)) %>%
  group_by(death_event) %>%
  summarise(orig = mean(sodium, na.rm = TRUE),
            imputation = mean(sodium_imp, na.rm = TRUE))
            
# If the variable of interest is a numerical variables
sodium <- imputate_outlier(heartfailure, sodium)
sodium
summary(sodium)

# plot(sodium)
# }

Run the code above in your browser using DataLab