consensus: Consensus Partitions and Hierarchies

Description

Compute the consensus clustering of an ensemble of partitions or hierarchies.

Usage

cl_consensus(x, method = NULL, weights = 1, control = list())

Arguments

an ensemble of partitions or hierarchies, or something coercible to that (see cl_ensemble).

method

a character string specifying one of the built-in methods for computing consensus clusterings, or a function to be taken as a user-defined method, or NULL (default value). If a character string, its lower-cased version is matched

weights

a numeric vector with non-negative case weights. Recycled to the number of elements in the ensemble given by x if necessary.

control

a list of control parameters. See Details.

Value

The consensus partition or hierarchy.

Details

Consensus clusterings synthesize the information in the elements of a cluster ensemble into a single clustering, often by minimizing a criterion function measuring how dissimilar consensus candidates are from the (elements of) the ensemble (the so-called optimization approach to consensus clustering).

The most popular criterion functions are of the form $L(x) = \sum w_b d(x_b, x)^p$, where $d$ is a suitable dissimilarity measure (see cl_dissimilarity), $w_b$ is the case weight given to element $x_b$ of the ensemble, and $p \ge 1$. If $p = 1$ and minimization is over all possible base clusterings, a consensus solution is called a median of the ensemble; if minimization is restricted to the elements of the ensemble, a consensus solution is called a medoid (see cl_medoid). For $p = 2$, we obtain least squares consensus partitions and hierarchies (generalized means). See also Gordon (1999) for more information. If all elements of the ensemble are partitions, the built-in consensus methods compute soft least squares consensus partitions for Euclidean, GV1 and co-membership dissimilarities (i.e., minima of $L(x) = \sum w_b d(x_b, x)^2$ over all soft partitions with $k$ classes).

Available methods are as follows. [object Object],[object Object],[object Object],[object Object] By default, method "DWH" is used for ensembles of partitions. If all elements of the ensemble are hierarchies, the following built-in methods for computing consensus hierarchies are available. [object Object],[object Object]

By default, method "cophenetic" is used for ensembles of hierarchies.

If a user-defined agreement method is to be employed, it must be a function taking the cluster ensemble, the case weights, and a list of control parameters as its arguments.

All built-in methods use heuristics for solving hard optimization problems, and cannot be guaranteed to find a global minimum. Standard practice would recommend to use the best solution found in sufficiently many replications of the methods.

References

E. Dimitriadou and A. Weingessel and K. Hornik (2002). A combination scheme for fuzzy clustering. International Journal of Pattern Recognition and Artificial Intelligence, 16, 901--912.

A. D. Gordon and M. Vichi (2001). Fuzzy partition models for fitting a set of partitions. Psychometrika, 66, 229--248.

A. D. Gordon (1999). Classification (2nd edition). Boca Raton, FL: Chapman & Hall/CRC.

T. Margush and F. R. McMorris (1981). Consensus $n$-trees. Bulletin of Mathematical Biology, 43, 239--244.

Examples

Run this code

## Consensus partition for the Rosenberg-Kim kinship terms partition
## data based on co-membership dissimilarities.
data("Kinship82")
m1 <- cl_consensus(Kinship82, method = "GV3",
                   control = list(k = 3, verbose = TRUE))
## (Note that one should really use several replicates of this.)
## Value for criterion function to be minimized:
sum(cl_dissimilarity(Kinship82, m1, "comem") ^ 2)
## Compare to the consensus solution given in Gordon & Vichi (2001).
data("Kinship82_Consensus")
m2 <- Kinship82_Consensus[["JMF"]]
sum(cl_dissimilarity(Kinship82, m2, "comem") ^ 2)
## Seems we get a better solution ...
## How dissimilar are these solutions?
cl_dissimilarity(m1, m2, "comem")
## How "fuzzy" are they?
cl_fuzziness(cl_ensemble(m1, m2))
## Do the "nearest" hard partitions fully agree?
cl_dissimilarity(as.cl_hard_partition(m1),
                 as.cl_hard_partition(m2))
## Hmm ...

## Consensus partition for the Gordon and Vichi (2001) macroeconomic
## partition data based on Euclidean dissimilarities.
data("GVME")
set.seed(1)
## First, using k=2 classes.
m1 <- cl_consensus(GVME, method = "GV1",
                   control = list(k = 2, verbose = TRUE))
## (Note that one should really use several replicates of this.)
## Value of criterion function to be minimized:
sum(cl_dissimilarity(GVME, m1, "GV1") ^ 2)
## Compare to the consensus solution given in Gordon & Vichi (2001).
data("GVME_Consensus")
m2 <- GVME_Consensus[["MF1/2"]]
sum(cl_dissimilarity(GVME, m2, "GV1") ^ 2)
## Seems we get a slightly  better solution ...
## But note that
cl_dissimilarity(m1, m2, "GV1")
## and that the maximal deviation is
max(abs(m1 - m2))
## so the differences seem to be due to rounding.
## Do the "nearest" hard partitions fully agree?
table(cl_class_ids(m1), cl_class_ids(m2))

## And now for k=3 classes.
m1 <- cl_consensus(GVME, method = "GV1",
                   control = list(k = 3, verbose = TRUE))
sum(cl_dissimilarity(GVME, m1, "GV1") ^ 2)
## Compare to the consensus solution given in Gordon & Vichi (2001).
m2 <- GVME_Consensus[["MF1/3"]]
sum(cl_dissimilarity(GVME, m2, "GV1") ^ 2)
## This time we look much better ...
## How dissimilar are these solutions?
cl_dissimilarity(m1, m2, "GV1")
## Do the "nearest" hard partitions fully agree?
table(cl_class_ids(m1), cl_class_ids(m2))

Run the code above in your browser using DataLab