Learn R Programming

StatMatch (version 1.4.2)

mahalanobis.dist: Computes the Mahalanobis Distance

Description

This function computes the Mahalanobis distance among units in a dataset or between observations in two distinct datasets.

Usage

mahalanobis.dist(data.x, data.y=NULL, vc=NULL)

Value

A matrix object with distances among rows of data.x and those of data.y.

Arguments

data.x

A matrix or a data frame containing variables that should be used in the computation of the distance between units. Only continuous variables are allowed. Missing values (NA) are not allowed.

When only data.x is supplied, the distances between rows of data.x is computed.

data.y

A numeric matrix or data frame with the same variables, of the same type, as those in data.x (only continuous variables are allowed). Dissimilarities between rows of data.x and rows of data.y will be computed. If not provided, by default it is assumed data.y=data.x and only dissimilarities between rows of data.x will be computed.

vc

Covariance matrix that should be used in distance computation. If it is not supplied (vc = NULL) it is estimated from the input data. In particular, when vc = NULL and only data.x is supplied then the covariance matrix is estimated from data.x (i.e. vc = var(data.x)). On the contrary when vc = NULL and both data.x and data.y are available then the covariance matrix is estimated on the joined data sets (i.e. vc = var(rbind(data.x, data.y))).

Author

Marcello D'Orazio mdo.statmatch@gmail.com

Details

The Mahalanobis distance is calculated by means of:

$$d(i,j)=\sqrt{(x_i - x_j)^T S^{-1} (x_i - x_j)}$$

The covariance matrix S is estimated from the available data when vc=NULL, otherwise the one supplied via the argument vc is used.

References

Mahalanobis, P C (1936) “On the generalised distance in statistics”. Proceedings of the National Institute of Sciences of India 2, pp. 49-55.

See Also

Examples

Run this code

md1 <- mahalanobis.dist(iris[1:6,1:4])
md2 <- mahalanobis.dist(data.x=iris[1:6,1:4], data.y=iris[51:60, 1:4])

vv <- var(iris[,1:4])
md1a <- mahalanobis.dist(data.x=iris[1:6,1:4], vc=vv)
md2a <- mahalanobis.dist(data.x=iris[1:6,1:4], data.y=iris[51:60, 1:4], vc=vv)

Run the code above in your browser using DataLab