maximum.dist: Computes the Maximum Distance

Description

This function computes the Maximum distance (or $L^\infty$ norm) between units in a dataset or between observations in two distinct datasets.

Usage

maximum.dist(data.x, data.y=data.x, rank=FALSE)

Value

A matrix object with distances between rows of data.x and those of data.y.

Arguments

data.x

A matrix or a data frame containing variables that should be used in the computation of the distance. Only continuous variables are allowed. Missing values (NA) are not allowed.

When only data.x is supplied, the distances between rows of data.x are computed.

data.y

A numeric matrix or data frame with the same variables, of the same type, as those in data.x (only continuous variables are allowed). Dissimilarities between rows of data.x and rows of data.y will be computed. If not provided, by default it is assumed data.y=data.x and only dissimilarities between rows of data.x will be computed.

rank

Logical, when TRUE the original values are substituted by their ranks divided by the number of values plus one (following suggestion in Kovar et al. 1988). This rank transformation permits to remove the effect of different scales on the distance computation. When computing ranks the tied observations assume the average of their position (ties.method = "average" in calling the rank function).

Author

Marcello D'Orazio mdo.statmatch@gmail.com

Details

This function computes the $L^\infty$ distance also know as minimax distance. In practice the distance between two records is the maximum of the absolute differences on the available variables:

$$d(i,j) = max \left( \left|x_{1i}-x_{1j} \right|, \left|x_{2i}-x_{2j} \right|,\ldots,\left|x_{Ki}-x_{Kj} \right| \right)$$

When rank=TRUE the original values are substituted by their ranks divided by the number of values plus one (following suggestion in Kovar et al. 1988).

References

Kovar, J.G., MacMillan, J. and Whitridge, P. (1988). “Overview and strategy for the Generalized Edit and Imputation System”. Statistics Canada, Methodology Branch Working Paper No. BSMD 88-007 E/F.

Examples

Run this code


md1 <- maximum.dist(iris[1:10,1:4])
md2 <- maximum.dist(iris[1:10,1:4], rank=TRUE)

md3 <- maximum.dist(data.x=iris[1:50,1:4], data.y=iris[51:100,1:4])
md4 <- maximum.dist(data.x=iris[1:50,1:4], data.y=iris[51:100,1:4], rank=TRUE)

Run the code above in your browser using DataLab