Learn R Programming

performance (version 0.2.0)

check_outliers: Check for influential observations

Description

Checks for and locates influential observations (i.e., "outliers") via Cook's Distance.

Usage

check_outliers(x, ...)

# S3 method for default check_outliers(x, threshold = 4/insight::n_obs(x), ...)

Arguments

x

A model object.

...

Currently not used.

threshold

The threshold indicating at which distance an observation is considered as outlier. For the Cook's Distance method, threshold defaults to 4 divided by numbers of observations.

Value

Check (message) on whether outliers were detected or not, as well as a data frame (with the original data that was used to fit the model), including information on the distance measure and whether or not an observation is considered as outlier.

Details

Performs a Cook's distance test to check for influential observations. Those greater than 4/n, are considered outliers. This relatively conservative threshold is useful only for detection, rather than justificaiton for automatic observation deletion. If users opt to drop observations that may be problematic, they may do so by specifying drop_outliers = TRUE.

References

Cook, R. D. (1977). Detection of influential observation in linear regression. Technometrics, 19(1), 15-18.

Examples

Run this code
# NOT RUN {
# select only mpg and disp (continuous)
mt1 <- mtcars[, c(1,3)]
# create some fake outliers and attach outliers to main df
mt2 <- rbind(mt1, data.frame(mpg = c(37, 40), disp = c(300, 400)))
# fit model with outliers
model <- lm(disp ~ mpg, data = mt2)

check_outliers(model)

# }

Run the code above in your browser using DataLab