A function for calculation of a proximity (dissimilarity) matrix based on the G4 similarity measure.
Usage
good4(data)
Arguments
data
A data.frame or a matrix with cases in rows and variables in colums.
Value
The function returns a dissimilarity matrix of the size n x n, where n is the number of objects in the original dataset in the argument data.
Details
The Goodall 4 similarity measure was presented in (Boriah et al., 2008). It is a simple modification of the original Goodall measure (Goodall, 1966).
It assigns higher weights to the frequent categories matches.
References
Boriah S., Chandola V., Kumar V. (2008). Similarity measures for categorical data: A comparative evaluation.
In: Proceedings of the 8th SIAM International Conference on Data Mining, SIAM, p. 243-254.
Goodall V.D. (1966). A new similarity index based on probability. Biometrics, 22(4), p. 882.