Learn R Programming

DescTools (version 0.99.43)

Outlier: Outlier

Description

Return outliers following Tukey's boxplot and Hampel's median/mad definition.

Usage

Outlier(x, method = c("boxplot", "hampel"), value = TRUE,na.rm = FALSE)

Arguments

x

a (non-empty) numeric vector of data values.

method

the method to be used. So far Tukey's boxplot and Hampel's rule are implemented.

value

logical. If FALSE, a vector containing the (integer) indices of the outliers is returned, and if TRUE (default), a vector containing the matching elements themselves is returned.

na.rm

logical. Should missing values be removed? Defaults to FALSE.

Value

the values of x lying outside the whiskers in a boxplot or the indices of them

Details

Outlier detection is a tricky problem and should be handled with care. We implement Tukey's boxplot rule as a rough idea of spotting extreme values.

Hampel considers values outside of median +/- 3 * (median absolute deviation) to be outliers.

References

Hampel F. R. (1974) The influence curve and its role in robust estimation, Journal of the American Statistical Association, 69, 382-393

See Also

boxplot

Examples

Run this code
# NOT RUN {
Outlier(d.pizza$temperature, na.rm=TRUE)

# it's the same as the result from boxplot
sort(d.pizza$temperature[Outlier(d.pizza$temperature, value=FALSE, na.rm=TRUE)])
b <- boxplot(d.pizza$temperature, plot=FALSE)
sort(b$out)

# nice to find the corresponding rows
d.pizza[Outlier(d.pizza$temperature, value=FALSE, na.rm=TRUE), ]

# compare to Hampel's rule
Outlier(d.pizza$temperature, method="hampel", na.rm=TRUE)


# outliers for the each driver
tapply(d.pizza$temperature, d.pizza$driver, Outlier, na.rm=TRUE)

# the same as:
boxplot(temperature ~ driver, d.pizza)$out
# }

Run the code above in your browser using DataLab