If all components are partitions, the following built-in methods for
measuring agreement between two partitions with respective membership
matrices \(u\) and \(v\) (brought to a common number of columns)
are available:
"euclidean"
\(1 - d / m\), where \(d\) is the
Euclidean dissimilarity of the memberships, i.e., the square root
of the minimal sum of the squared differences of \(u\) and all
column permutations of \(v\), and \(m\) is an upper bound for
the maximal Euclidean dissimilarity.
See Dimitriadou+Weingessel+Hornik:2002.
"manhattan"
\(1 - d / m\), where \(d\) is the
Manhattan dissimilarity of the memberships, i.e., the minimal
sum of the absolute differences of \(u\) and all column
permutations of \(v\), and \(m\) is an upper bound for the
maximal Manhattan dissimilarity.
"Rand"
the Rand index (the rate of distinct pairs of
objects both in the same class or both in different classes in
both partitions), see Rand:1971 or
|R:Gordon:1999|page 198.
For soft partitions, (currently) the Rand index of the
corresponding nearest hard partitions is used.
"cRand"
the Rand index corrected for agreement by
chance, see Hubert+Arabie:1985 or
|R:Gordon:1999|page 198.
Can only be used for hard partitions.
"NMI"
Normalized Mutual Information, see
Strehl+Ghosh:2002.
For soft partitions, (currently) the NMI of the
corresponding nearest hard partitions is used.
"KP"
the Katz-Powell index, i.e., the product-moment
correlation coefficient between the elements of the co-membership
matrices \(C(u) = u u'\) and \(C(v)\), respectively,
see Katz+Powell:1953. For soft partitions, (currently) the
Katz-Powell index of the corresponding nearest hard partitions is
used. (Note that for hard partitions, the \((i,j)\) entry of
\(C(u)\) is one iff objects \(i\) and \(j\) are in the same
class.)
"angle"
the maximal cosine of the angle between the
elements of \(u\) and all column permutations of \(v\).
"diag"
the maximal co-classification rate, i.e., the
maximal rate of objects with the same class ids in both
partitions after arbitrarily permuting the ids.
"FM"
the index of Fowlkes+Mallows:1983, i.e.,
the ratio \(N_{xy} / \sqrt{N_x N_y}\) of
the number \(N_{xy}\) of distinct pairs of objects in the
same class in both partitions and the geometric mean of the
numbers \(N_x\) and \(N_y\) of distinct pairs of objects in
the same class in partition \(x\) and partition \(y\),
respectively.
For soft partitions, (currently) the Fowlkes-Mallows index of the
corresponding nearest hard partitions is used.
"Jaccard"
the Jaccard index, i.e., the ratio of the
numbers of distinct pairs of objects in the same class in both
partitions and in at least one partition, respectively.
For soft partitions, (currently) the Jaccard index of the
corresponding nearest hard partitions is used.
"purity"
the purity of the classes of x with
respect to those of y, i.e.,
\(\sum_j \max_i n_{ij} / n\),
where \(n_{ij}\) is the joint frequency of objects in class
\(i\) for x and in class \(j\) for y, and
\(n\) is the total number of objects.
"PS"
Prediction Strength, see
Tibshirani+Walther:2005: the minimum, over all classes
\(j\) of y, of the maximal rate of objects in the same
class for x and in class \(j\) for y.
If all components are hierarchies, available built-in methods for
measuring agreement between two hierarchies with respective
ultrametrics \(u\) and \(v\) are as follows.
"euclidean"
\(1 / (1 + d)\), where \(d\) is the
Euclidean dissimilarity of the ultrametrics (i.e., the square root
of the sum of the squared differences of \(u\) and \(v\)).
"manhattan"
\(1 / (1 + d)\), where \(d\) is the
Manhattan dissimilarity of the ultrametrics (i.e., the sum of the
absolute differences of \(u\) and \(v\)).
"cophenetic"
The cophenetic correlation coefficient.
(I.e., the product-moment correlation of the ultrametrics.)
"angle"
the cosine of the angle between the
ultrametrics.
"gamma"
\(1 - d\), where \(d\) is the rate of
inversions between the associated ultrametrics (i.e., the rate of
pairs \((i,j)\) and \((k,l)\) for which \(u_{ij} < u_{kl}\)
and \(v_{ij} > v_{kl}\)). (This agreement measure is a linear
transformation of Kruskal's \(\gamma\).)
The measures based on ultrametrics also allow computing agreement with
“raw” dissimilarities on the underlying objects (R objects
inheriting from class "dist").
If a user-defined agreement method is to be employed, it must be a
function taking two clusterings as its arguments.