Learn R Programming

HDoutliers (version 1.0.4)

getHDoutliers: Outlier Detection Stage of Wilkinson's hdoutliers Algorithm

Description

Detects outliers based on a probability model.

Usage

getHDoutliers(data, memberLists, alpha = 0.05, transform = TRUE)

Arguments

data

A vector, matrix, or data frame consisting of numeric and/or categorical variables.

memberLists

A list following the structure of the output to getHDmembers, in which each component is a vector of observation indexes. The first index in each list is the index of the exemplar representing that list, and any remaining indexes are the associated members, considered `close to' the exemplar.

alpha

Threshold for determining the cutoff for outliers. Observations are considered outliers outliers if they fall in the \((1- alpha)\) tail of the distribution of the nearest-neighbor distances between exemplars.

transform

A logical variable indicating whether or not the data needs to be transformed to conform to Wilkinson's specifications before outlier detection. The default is to transform the data using function dataTrans. In Wilksinson's algorithm, memberLists would have been created with transformed data.

Value

The indexes of the observations determined to be outliers.

Details

An exponential distribution is fitted to the upper tail of the nearest-neighbor distances between exemplars (the observations considered representatives of each component of memberLists). Observations are considered outliers if they fall in the \((1- alpha)\) tail of the fitted CDF.

References

Wilkinson, L. (2016). Visualizing Outliers. <https://www.cs.uic.edu/~wilkinson/Publications/outliers.pdf>.

See Also

HDoutliers, getHDmembers, dataTrans

Examples

Run this code
# NOT RUN {
data(dots)
mem.W <- getHDmembers(dots$W)
out.W <- getHDoutliers(dots$W,mem.W)
# }
# NOT RUN {
plotHDoutliers( dots.W, out.W)
# }
# NOT RUN {
data(ex2D)
mem.ex2D <- getHDmembers(ex2D)
out.ex2D <- getHDoutliers( ex2D, mem.ex2D)
# }
# NOT RUN {
plotHDoutliers( ex2D, out.ex2D)
# }
# NOT RUN {
# }
# NOT RUN {
n <- 100000 # number of observations
set.seed(3)
x <- matrix(rnorm(2*n),n,2)
nout <- 10 # number of outliers
x[sample(1:n,size=nout),] <- 10*runif(2*nout,min=-1,max=1)

mem.x <- getHDmembers(x)
out.x <- getHDoutliers(x)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab