pclust: Prototype-Based Partitions of Clusterings

Description

Compute prototype-based partitions of a cluster ensemble by minimizing $\sum u_{bj}^m d(x_b, p_j)^e$, the sum of the membership-weighted $e$-th powers of the dissimilarities between the elements $x_b$ of the ensemble and the prototypes $p_j$, for suitable dissimilarities $d$ and exponents $e$.

Usage

cl_pclust(x, k, m = 1, control = list())

Arguments

an ensemble of partitions or hierarchies, or something coercible to that (see cl_ensemble).

an integer giving the number of classes to be used in the partition.

a number not less than 1 controlling the softness of the partition (as the fuzzification parameter of the fuzzy $c$-means algorithm). The default value of 1 corresponds to hard partitions obtained from a generalized $k$-means

control

a list of control parameters. See Details.

Value

An object of class "cl_pclust" representing the obtained secondary partition, which is a list with the following components.
prototypesa cluster ensemble with the $k$ prototypes.
membershipan object of class "cl_membership" with the membership values $u_{bj}$.
clusterthe class ids of the nearest hard partition.
silhouetteSilhouette information for the partition, see silhouette.
validityprecomputed validity measures for the partition.
mthe softness control argument.
callthe matched call.
dthe dissimilarity function $d = d(x, p)$ employed.
ethe exponent $e$ employed.

Details

For $m = 1$, a generalization of the Lloyd-Forgy variant of the $k$-means algorithm is used, which iterates between reclassifying objects to their closest prototypes, and computing new prototypes as consensus clusterings for the classes. This may result in degenerate solutions, and will be replaced by a Hartigan-Wong style algorithm eventually.

For $m > 1$, a generalization of the fuzzy $c$-means recipe (e.g., Bezdek (1981)) is used, which alternates between computing optimal memberships for fixed prototypes, and computing new prototypes as the consensus clusterings for the classes.

This procedure is repeated until convergence occurs, or the maximal number of iterations is reached.

Consensus clusterings are computed using cl_consensus.

Available control parameters are as follows. [object Object],[object Object],[object Object],[object Object]

The dissimilarities $d$ and exponent $e$ are implied by the consensus method employed, and inferred via a registration mechanism currently only made available to built-in consensus methods. The default methods compute Least Squares Euclidean consensus clusterings, i.e., use Euclidean dissimilarity $d$ and $e = 2$.

The fixed point approach employed is a heuristic which cannot be guaranteed to find the global minimum (as this is already true for the computation of consensus clusterings). Standard practice would recommend to use the best solution found in sufficiently many replications of the base algorithm.

References

J. C. Bezdek (1981). Pattern recognition with fuzzy objective function algorithms. New York: Plenum.

Examples

Run this code

## Use a precomputed ensemble of 50 k-means partitions of the
## Cassini data.
data("CKME")
CKME <- CKME[1 : 30]		# for saving precious time ...
diss <- cl_dissimilarity(CKME)
hc <- hclust(diss)
plot(hc)
## This suggests using a partition with three classes, which can be
## obtained using cutree(hc, 3).  Could use cl_consensus() to compute
## prototypes as the least squares consensus clusterings of the classes,
## or alternatively:
set.seed(123)
x1 <- cl_pclust(CKME, 3, m = 1)
x2 <- cl_pclust(CKME, 3, m = 2)
## Agreement of solutions.
cl_dissimilarity(x1, x2)
table(cl_class_ids(x1), cl_class_ids(x2))

Run the code above in your browser using DataLab