Learn R Programming

nomclust (version 2.1.6)

of: Occurence Frequency (OF) Measure

Description

A function for calculation of a proximity (dissimilarity) matrix based on the OF similarity measure.

Usage

of(data)

Arguments

data

A data.frame or a matrix with cases in rows and variables in colums.

Value

The function returns a dissimilarity matrix of the size n x n, where n is the number of objects in the original dataset in the argument data.

Details

The OF (Occurrence Frequency) measure was originally constructed for the text mining tasks, see (Sparck-Jones, 1972), later, it was adjusted for categorical variables, see (Boriah et al., 2008) It assigns higher weight to mismatches on less frequent values and otherwise.

References

Boriah S., Chandola V., Kumar V. (2008). Similarity measures for categorical data: A comparative evaluation. In: Proceedings of the 8th SIAM International Conference on Data Mining, SIAM, p. 243-254.

Spark-Jones K. (1972). A statistical interpretation of term specificity and its application in retrieval. In Journal of Documentation, 28(1), p. 11-21. Later: Journal of Documentation, 60(5) (2002), p. 493-502.

See Also

eskin, good1, good2, good3, good4, iof, lin, lin1, morlini, sm, ve, vm.

Examples

Run this code
# NOT RUN {
# sample data
data(data20)

# dissimilarity matrix calculation
prox.of <- of(data20)

# }

Run the code above in your browser using DataLab