lin1: Lin 1 Measure

Description

The Lin 1 similarity measure was firstly introduced in (Boriah et al., 2008). In has a complex system of weights. In case of mismatch, lower similarity is assigned if either the mismatching values are very frequent or their relative frequency is in between the relative frequencies of mismatching values. Higher similarity is assigned if the mismatched categories are infrequent and there are a few other infrequent categories. In case of match, lower similarity is given for matches on frequent categories or matches on categories that have many other values of the same frequency. Higher similarity is given to matches on infrequent categories.

Hierarchical clustering methods require a proximity (dissimilarity) matrix instead of a similarity matrix as an entry for the analysis; therefore, dissimilarity D is computed from similarity S according the equation 1/S-1. After this transformation, it may happen that some values in a proximity matrix get the value -Inf. Therefore, the following adjustment is applied: max(prox)+1, where prox is a proximity matrix.

The use and evaluation of clustering with this measure can be found e.g. in (Sulc, 2015).

Usage

lin1(data)

Arguments

data

data frame with cases in rows and variables in colums. Cases are characterized by nominal (categorical) variables coded as numbers.

Value

Function returns a matrix of the size n x n, where n is the number of objects in original data. The matrix contains proximities between all pairs of objects. It can be used in hierarchical cluster analyses (HCA), e.g. in agnes.

References

Boriah, S., Chandola and V., Kumar, V. (2008). Similarity measures for categorical data: A comparative evaluation. In: Proceedings of the 8th SIAM International Conference on Data Mining, SIAM, p. 243-254.

Sulc, Z. (2015). Application of Goodall's and Lin's similarity measures in hierarchical clustering. In Sbornik praci vedeckeho seminare doktorskeho studia FIS VSE. Praha: Oeconomica, 2015, p. 112-118. Available at: http://fis.vse.cz/wp-content/uploads/2015/01/DD_FIS_2015_CELY_SBORNIK.pdf.

Examples

Run this code

# NOT RUN {
#sample data
data(data20)
# Creation of proximity matrix
prox_lin1 <- lin1(data20)

# }

Run the code above in your browser using DataLab