Pipe-friendly wrapper around to the function
mahalanobis()
, which returns the squared
Mahalanobis distance of all rows in x. Compared to the base function, it
automatically flags multivariate outliers.
Mahalanobis distance is a common metric used to identify multivariate
outliers. The larger the value of Mahalanobis distance, the more unusual the
data point (i.e., the more likely it is to be a multivariate outlier).
The distance tells us how far an observation is from the center of the cloud, taking into
account the shape (covariance) of the cloud as well.
To detect outliers, the calculated Mahalanobis distance is compared against
a chi-square (X^2) distribution with degrees of freedom equal to the number
of dependent (outcome) variables and an alpha level of 0.001.
The threshold to declare a multivariate outlier is determined using the
function qchisq(0.999, df)
, where df is the degree of freedom (i.e.,
the number of dependent variable used in the computation).