Learn R Programming

extremevalues (version 2.4.1)

getOutliers: Detect outliers

Description

getOutliers is a wrapper function for getOutliersI and getOutliersII.

Usage

getOutliers(y, method="I",  ...)
getOutliersI(y, rho=c(1,1), FLim=c(0.1,0.9), distribution="normal")
getOutliersII(y, alpha=c(0.05, 0.05), FLim=c(0.1, 0.9), 
   distribution="normal", returnResiduals=TRUE)

Value

nOut

Number of left and right outliers.

iLeft

Index vector indicating left outliers in y

iRight

Index vector indicating right outiers in y

limit

For Method I: y-values below (above) limit[1] (limit[2]) are outliers. For Method II: elements with residuals below (above) limit[1] (limit[2]) are outliers if all smaller (larger) elements are outliers as well.

method

The used method: "method I" or "method II"

distribution

The used model distribution

Fmin

FLim[1]

Fmax

FLim[2]

yMin

Smallest y-value used in fit

yMax

Largest y-value used in fit

Nfit

Number of values used in the fit

rho

Method I, the input rho-values for left and right outliers

alphaConf

Method II, the input confidence levels for left and right outliers

R2

R-squared value for the fit. Note that this is the ordinary least squares value, defined by \(R^2=1-SS_{err}/SS_{y}\). Where \(SS_{err}\) is the squared sum of residuals. For the lognormal, Pareto and Weibull models, the \(y\)-variable is transformed before fitting. Since predicted values are transformed back before calculating \(SS_{err}\), this \(R^2\) can be negative.

lambda

(exponential distribution) Estimated location (and spread) parameter for \(f(y)=\lambda\exp(-\lambda y)\)

mu

(lognormal distribution) Estimated \( E(\ln(y))\) for lognormal distribution

sigma

(lognormal distribution) Estimated \(Var(ln(y))\) for lognormal distribution

ym

(pareto distribution) Estimated location parameter (mode) for pareto distribution

alpha

(pareto distribution) Estimated spread parameter for pareto distribution

k

(weibull distribution) estimated shape parameter \(k\) for weibull distribution

lambda

(weibull distribution) estimated scale parameter \(\lambda\) for weibull distribution

mu

(normal distribution) Estimated \( E(y)\) for normal distribution

sigma

(normal distribution) Estimated \(Var(y)\) for normal distribution

Arguments

y

Vector of one-dimensional nonnegative data

method

"I" or "II"

...

Optional arguments to be passed to getOutliersI or getOutliersII

distribution

Model distribution used to estimate the limit. Choose from "lognormal", "exponential", "pareto", "weibull" or "normal" (default).

FLim

c(Fmin,Fmax) quantile limits indicating which data should be used to fit the model distribution. Must obey 0 < Fmin < Fmax < 1.

rho

(Method I) A value \(y_i\) is an outlier if it is below (above) the limit where less then rho[2] (rho[1]) observations are expected. Must be >0.

alpha

(Method II) A value \(y_i\) is an outlier if it has a residual below (above) the alpha[1] (alpha[2]) confidence limit for the residues. Must be between 0 and 1.

returnResiduals

(Method II) Whether or not to return a vector of residuals from the fit

Author

Mark van der Loo, see www.markvanderloo.eu

Details

Both methods use the subset of \(y\)-values between the Fmin and Fmax quantiles to fit a model cumulative density distribution. Method I detects outliers by checking which are below (above) the limit where according to the model distribution less then rho[1] (rho[2]) observations are expected (given length(y) observations). Method II detects outliers by finding the observations (not used in the fit) who's fit residuals are below (above) the estimated confidence limit alpha[1] (alpha[2]) while all lower (higher) observations are outliers too.

References

M.P.J. van der Loo, Distribution based outlier detection for univariate data. Discussion paper 10003, Statistics Netherlands, The Hague. Available from www.markvanderloo.eu or www.cbs.nl.

The file <your R directory>/R-<version>/library/extremevalues/extremevalues.pdf contains a worked example. It can also be downloaded from my website.

Examples

Run this code
y <- rlnorm(100)
y <- c(0.1*min(y),y,10*max(y))
K <- getOutliers(y,method="I",distribution="lognormal")
L <- getOutliers(y,method="II",distribution="lognormal")
par(mfrow=c(1,2))
outlierPlot(y,K,mode="qq")
outlierPlot(y,L,mode="residual")

Run the code above in your browser using DataLab