ve: Variable Entropy (VE) Measure

Description

The function calculates a dissimilarity matrix based on the VE similarity measure.

Usage

ve(data, var.weights = NULL)

Value

The function returns an object of the class "dist".

Arguments

data: A data.frame or a matrix with cases in rows and variables in columns.
var.weights: A numeric vector setting weights to the used variables. One can choose the real numbers from zero to one.

Author

Zdenek Sulc.
Contact: zdenek.sulc@vse.cz

Details

The Variable Entropy similarity measure was introduced in (Sulc and Rezankova, 2019). It treats the similarity between two categories based on the within-cluster variability expressed by the normalized entropy. The measure assigns higher weights to rare categories.

References

Boriah S., Chandola V., Kumar V. (2008). Similarity measures for categorical data: A comparative evaluation. In: Proceedings of the 8th SIAM International Conference on Data Mining, SIAM, p. 243-254.

Sulc Z. and Rezankova H. (2019). Comparison of Similarity Measures for Categorical Data in Hierarchical Clustering. Journal of Classification. 2019, 35(1), p. 58-72. DOI: 10.1007/s00357-019-09317-5.

Examples

Run this code

# sample data
data(data20)

# dissimilarity matrix calculation
prox.ve <- ve(data20)

# dissimilarity matrix calculation with variable weighting
prox.ve.2 <- ve(data20, var.weights = c(1, 0.8, 0.6, 0.4, 0.2))

# dissimilarity matrix calculation with variable weights
weights.ve <- ve(data20, var.weights = c(0.7, 1, 0.9, 0.5, 0))

Run the code above in your browser using DataLab