dissimilarity: Dissimilarity Between Relations

Description

Compute the dissimilarity between (ensembles of) relations.

Usage

relation_dissimilarity(x, y = NULL, method = "symdiff", ...)

Value

If y is NULL, an object of class dist

containing the dissimilarities between all pairs of elements of

x. Otherwise, a matrix with the dissimilarities between the elements of x and the elements of y.

Arguments

x: an ensemble of relations (see relation_ensemble()), or something which can be coerced to such.
y: NULL (default), or as for x.
method: a character string specifying one of the built-in methods for computing dissimilarity, or a function to be taken as a user-defined method. If a character string, its lower-cased version is matched against the lower-cased names of the available built-in methods using pmatch(). See Details for available built-in methods.
...: further arguments to be passed to methods.

Details

Available built-in methods are as follows.

"symdiff"

symmetric difference distance. This computes the cardinality of the symmetric difference of two relations, i.e., the number of tuples contained in exactly one of two relations. For preference relations, this coincides with the Kemeny-Snell metric (Kemeny and Snell, 1962). For linear orders, it gives Kendall's \(\tau\) metric (Diaconis, 1988).

Can also be referred to as "SD".

Only applicable to crisp relations.

"manhattan"

the Manhattan distance between the incidences.

"euclidean"

the Euclidean distance between the incidences.

"CS"

Cook-Seiford distance, a generalization of the distance function of Cook and Seiford (1978). Let the generalized ranks of an object \(a\) in the (first) domain of an endorelation \(R\) be defined as the number of objects \(b\) dominating \(a\) (i.e., for which \(a R b\) and not \(b R a\)), plus half the number of objects \(b\) equivalent to \(a\) (i.e., for which \(a R b\) and \(b R a\)). For preference relations, this gives the usual Kendall ranks arranged according to decreasing preference (and averaged for ties). Then the generalized Cook-Seiford distance is defined as the \(l_1\) distance between the generalized ranks. For linear orders, this gives Spearman's footrule metric (Diaconis, 1988).

Only applicable to crisp endorelations.

"CKS"

Cook-Kress-Seiford distance, a generalization of the distance function of Cook, Kress and Seiford (1986). For each pair of objects \(a\) and \(b\) in an endorelation \(R\), we can have \(a R b\) and not \(b R a\) or vice versa (cases of “strict preference”), \(a R b\) and \(b R a\) (the case of “indifference”), or neither \(a R b\) nor \(b R a\) (the case of “incomparability”). (Only the last two are possible if \(a = b\).) The distance by Cook, Kress and Seiford puts indifference as the metric centroid between both preference cases and incomparability (i.e., indifference is at distance one from the other three, and each of the other three is at distance two from the others). The generalized Cook-Kress-Seiford distance is the paired comparison distance (i.e., a metric) based on these distances between the four paired comparison cases. (Formula 3 in the reference must be slightly modified for the generalization from partial rankings to arbitrary endorelations.)

Only applicable to crisp endorelations.

"score"

score-based distance. This computes \(\Delta(s(x), s(y))\) for suitable score and distance functions \(s\) and \(\Delta\), respectively. These can be specified by additional arguments score and Delta. If score is a character string, it is taken as the method for relation_scores. Otherwise, if given it must be a function giving the score function itself. If Delta is a number \(p \ge 1\), the usual \(l_p\) distance is used. Otherwise, it must be a function giving the distance function. The defaults correspond to using the default relation scores and \(p = 1\), which for linear orders gives Spearman's footrule distance.

Only applicable to endorelations.

"Jaccard"

Jaccard distance: 1 minus the ratio of the cardinalities of the intersection and the union of the relations.

"PC"

(generalized) paired comparison distance. This generalizes the symdiff and CKS distances to use a general set of discrepancies \(\delta_{kl}\) between the possible paired comparison results with \(a,b\)/\(b,a\) incidences 0/0, 1/0, 0/1, and 1/1 numbered from 1 to 4 (in a preference context with a \(\le\) encoding, these correspond to incompatibility, strict \(<\) and \(>\) preference, and indifference), with \(\delta_{kl}\) the discrepancy between possible results \(k\) and \(l\). The distance is then obtained as the sum of the discrepancies from the paired comparisons of distinct objects, plus half the sum of discrepancies from the comparisons of identical objects (for which the only possible results are incomparability and indifference). The distance is a metric provided that the \(\delta_{kl}\) satisfy the metric conditions (non-negativity and zero iff \(k = l\), symmetry and sub-additivity).

The discrepancies can be specified via the additional argument delta, either as a numeric vector of length 6 with the non-redundant values \(\delta_{21}, \delta_{31}, \delta_{41}, \delta_{32}, \delta_{42}, \delta_{43}\), or as a character string partially matching one of the following built-in discrepancies with corresponding parameter vector \(\delta\):

"symdiff"

symmetric difference distance, with discrepancy between distinct results two between either opposite strict preferences or indifference and incomparability, and one otherwise: \(\delta = (1, 1, 2, 2, 1, 1)\) (default).

Can also be referred to as "SD".

"CKS"

Cook-Kress-Seiford distance, see above: \(\delta = (2, 2, 1, 2, 1, 1)\).

"EM"

the distance obtained from the generalization of the Kemeny-Snell distance for complete rankings to partial rankings introduced in Emond and Mason (2000). This uses a discrepancy of two for opposite strict preferences, and one for all other distinct results: \(\delta = (1, 1, 1, 2, 1, 1)\).

"JMB"

the distance with parameters as suggested by Jabeur, Martel and Ben Khélifa (2004): \(\delta = (4/3, 4/3, 4/3, 5/3, 1, 1)\).

"discrete"

the discrete metric on the set of paired comparison results: \(\delta = (1, 1, 1, 1, 1, 1)\).

Only applicable to crisp endorelations.

Methods "symdiff", "manhattan", "euclidean" and "Jaccard" take an additional logical argument na.rm: if true (default: false), tuples with missing memberships are excluded in the dissimilarity computations.

References

W. D. Cook, M. Kress and L. M. Seiford (1986), Information and preference in partial orders: a bimatrix representation. Psychometrika 51/2, 197--207. tools:::Rd_expr_doi("10.1007/BF02293980").

W. D. Cook and L. M. Seiford (1978), Priority ranking and consensus formation. Management Science, 24/16, 1721--1732. tools:::Rd_expr_doi("10.1287/mnsc.24.16.1721").

P. Diaconis (1988), Group Representations in Probability and Statistics. Institute of Mathematical Statistics: Hayward, CA.

E. J. Emond and D. W. Mason (2000), A new technique for high level decision support. Technical Report ORD Project Report PR2000/13, Operational Research Division, Department of National Defence, Canada.

K. Jabeur, J.-M. Martel and S. Ben Khélifa (2004). A distance-based collective preorder integrating the relative importance of the groups members. Group Decision and Negotiation, 13, 327--349. tools:::Rd_expr_doi("10.1023/B:GRUP.0000042894.00775.75").

J. G. Kemeny and J. L. Snell (1962), Mathematical Models in the Social Sciences, chapter “Preference Rankings: An Axiomatic Approach”. MIT Press: Cambridge.