Learn R Programming

rrcovHD (version 0.3-1)

OutlierMahdist: Outlier identification using robust (mahalanobis) distances based on robust multivariate location and covariance matrix

Description

This function uses the Mahalanobis distance as a basis for multivariate outlier detection. The standard method for multivariate outlier detection is robust estimation of the parameters in the Mahalanobis distance and the comparison with a critical value of the Chi2 distribution (Rousseeuw and Van Zomeren, 1990).

Usage

OutlierMahdist(x, ...)
    # S3 method for default
OutlierMahdist(x, grouping, control, trace=FALSE, ...)
    # S3 method for formula
OutlierMahdist(formula, data, ..., subset, na.action)

Value

An S4 object of class OutlierMahdist which is a subclass of the virtual class Outlier.

Arguments

formula

a formula with no response variable, referring only to numeric variables.

data

an optional data frame (or similar: see model.frame) containing the variables in the formula formula.

subset

an optional vector used to select rows (observations) of the data matrix x.

na.action

a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The default is na.omit.

...

arguments passed to or from other methods.

x

a matrix or data frame.

grouping

grouping variable: a factor specifying the class for each observation.

control

a control object (S4) for one of the available control classes, e.g. CovControlMcd-class, CovControlOgk-class, CovControlSest-class, etc., containing estimation options. The class of this object defines which estimator will be used. Alternatively a character string can be specified which names the estimator - one of auto, sde, mcd, ogk, m, mve, sfast, surreal, bisquare, rocke. If 'auto' is specified or the argument is missing, the function will select the estimator (see below for details)

trace

whether to print intermediate results. Default is trace = FALSE

Author

Valentin Todorov valentin.todorov@chello.at

Details

If the data set consists of two or more classes (specified by the grouping variable grouping) the proposed method iterates through the classes present in the data, separates each class from the rest and identifies the outliers relative to this class, thus treating both types of outliers, the mislabeled and the abnormal samples in a homogenous way.

The estimation method is selected by the control object control. If a character string naming an estimator is specified, a new control object will be created and used (with default estimation options). If this argument is missing or a character string 'auto' is specified, the function will select the robust estimator according to the size of the dataset - for details see CovRobust.

References

P. J. Rousseeuw and B. C. Van Zomeren (1990). Unmasking multivariate outliers and leverage points. Journal of the American Statistical Association. Vol. 85(411), pp. 633-651.

P. J. Rousseeuw and A. M. Leroy (1987). Robust Regression and Outlier Detection. Wiley.

P. J. Rousseeuw and K. van Driessen (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212--223.

Todorov V & Filzmoser P (2009). An Object Oriented Framework for Robust Multivariate Analysis. Journal of Statistical Software, 32(3), 1--47, tools:::Rd_expr_doi("10.18637/jss.v032.i03").

Filzmoser P & Todorov V (2013). Robust tools for the imperfect world, Information Sciences 245, 4--20. tools:::Rd_expr_doi("10.1016/j.ins.2012.10.017").

Examples

Run this code

data(hemophilia)
obj <- OutlierMahdist(gr~.,data=hemophilia)
obj

getDistance(obj)            # returns an array of distances
getClassLabels(obj, 1)      # returns an array of indices for a given class
getCutoff(obj)              # returns an array of cutoff values (for each class, usually equal)
getFlag(obj)                #  returns an 0/1 array of flags
plot(obj, class=2)          # standard plot function

Run the code above in your browser using DataLab