Learn R Programming

DJL (version 3.9)

dm.mahalanobis: Distance measure using Mahalanobis distance for outlier detection

Description

Implements Mahalanobis distance measure for outlier detection. In addition to the basic distance measure, boxplots are provided with potential outlier(s) to give an insight into the early stage of data cleansing task.

Usage

dm.mahalanobis(data, from="median", p=10, plot=FALSE, v.index=NULL, layout=NULL)

Value

$dist

Mahalanobis distance from from

$excluded

Excluded row(s) in row number

$order

Distance order (decreasing) in row number

$suspect

Potential outlier(s) in row number

Arguments

data

Dataframe

from

Datum point from which the distance is measured
"mean" Mean of each column
"median" Median of each column (default)

p

Percentage to which outlier point(s) is noted (default of 10)

plot

Switch for boxplot(s)

v.index

Numeric vector indicating column(s) to be printed in the boxplot. Default value of NULL will present all.

layout

Numeric vector indicating dimension of boxplots. Default value of NULL will find an optimal layout.

Author

Dong-Joon Lim, PhD

References

Hair, Joseph F., et al. Multivariate data analysis. Vol. 7. Upper Saddle River, NJ: Pearson Prentice Hall, 2006.

Examples

Run this code
# Generate a sample dataframe
df <- data.frame(replicate(6, sample(0 : 100, 50)))

# go
dm.mahalanobis(df, plot = TRUE)

Run the code above in your browser using DataLab