Learn R Programming

clue (version 0.3-65)

pclust: Prototype-Based Partitioning

Description

Obtain prototype-based partitions of objects by minimizing the criterion \(\sum w_b u_{bj}^m d(x_b, p_j)^e\), the sum of the case-weighted and membership-weighted \(e\)-th powers of the dissimilarities between the objects \(x_b\) and the prototypes \(p_j\), for suitable dissimilarities \(d\) and exponents \(e\).

Usage

pclust(x, k, family, m = 1, weights = 1, control = list())
pclust_family(D, C, init = NULL, description = NULL, e = 1,
              .modify = NULL, .subset = NULL)
pclust_object(prototypes, membership, cluster, family, m = 1,
              value, ..., classes = NULL, attributes = NULL)

Value

pclust returns the partition found as an object of class

"pclust" (as obtained by calling pclust_object) which in addition to the default components contains call (the matched call) and a converged attribute indicating convergence status (i.e., whether the maximal number of iterations was reached).

pclust_family returns an object of class

"pclust_family", which is a list with components corresponding to the formals of pclust_family.

pclust_object returns an object inheriting from class

"pclust", which is a list with components corresponding to the formals (up to and including ...) of

pclust_object, and additional classes and attributes specified by classes and attributes, respectively.

Arguments

x

the object to be partitioned.

k

an integer giving the number of classes to be used in the partition.

family

an object of class "pclust_family" as generated by pclust_family, containing the information about \(d\) and \(e\).

m

a number not less than 1 controlling the softness of the partition (as the “fuzzification parameter” of the fuzzy \(c\)-means algorithm). The default value of 1 corresponds to hard partitions obtained from a generalized \(k\)-means problem; values greater than one give partitions of increasing softness obtained from a generalized fuzzy \(c\)-means problem.

weights

a numeric vector of non-negative case weights. Recycled to the number of elements given by x if necessary.

control

a list of control parameters. See Details.

D

a function for computing dissimilarities \(d\) between objects and prototypes.

C

a ‘consensus’ function with formals x, weights and control for computing a consensus prototype \(p\) minimizing \(\sum_b w_b d(x_b, p) ^ e\).

init

a function with formals x and k initializing an object with \(k\) prototypes from the object x to be partitioned.

description

a character string describing the family.

e

a number giving the exponent \(e\) of the criterion.

.modify

a function with formals x, i and value for modifying a single prototype, or NULL (default).

.subset

a function with formals x and i for subsetting prototypes, or NULL (default).

prototypes

an object representing the prototypes of the partition.

membership

an object of class "cl_membership" with the membership values \(u_{bj}\).

cluster

the class ids of the nearest hard partition.

value

the value of the criterion to be minimized.

...

further elements to be included in the generated pclust object.

classes

a character vector giving further classes to be given to the generated pclust object in addition to "pclust", or NULL (default).

attributes

a list of attributes, or NULL (default).

Details

For \(m = 1\), a generalization of the Lloyd-Forgy variant of the \(k\)-means algorithm is used, which iterates between reclassifying objects to their closest prototypes (according to the dissimilarities given by D), and computing new prototypes as the consensus for the classes (using C).

For \(m > 1\), a generalization of the fuzzy \(c\)-means recipe (e.g., Bezdek (1981)) is used, which alternates between computing optimal memberships for fixed prototypes, and computing new prototypes as the suitably weighted consensus clusterings for the classes.

This procedure is repeated until convergence occurs, or the maximal number of iterations is reached.

Currently, no local improvement heuristics are provided.

It is possible to perform several runs of the procedure via control arguments nruns or start (the default is to perform a single run), in which case the first partition with the smallest value of the criterion is returned.

The dissimilarity and consensus functions as well as the exponent \(e\) are specified via family. In principle, arbitrary representations of objects to be partitioned and prototypes (which do not necessarily have to be “of the same kind”) can be employed. In addition to D and C, what is needed are means to obtain an initial collection of \(k\) prototypes (init), to modify a single prototype (.modify), and subset the prototypes (.subset). By default, list and (currently, only dense) matrix (with the usual convention that the rows correspond to the objects) are supported. Otherwise, the family has to provide the functions needed.

Available control parameters are as follows.

maxiter

an integer giving the maximal number of iterations to be performed. Defaults to 100.

nruns

an integer giving the number of runs to be performed. Defaults to 1.

reltol

the relative convergence tolerance. Defaults to sqrt(.Machine$double.eps).

start

a list of prototype objects to be used as starting values.

verbose

a logical indicating whether to provide some output on minimization progress. Defaults to getOption("verbose").

control

control parameters to be used in the consensus function.

The fixed point approach employed is a heuristic which cannot be guaranteed to find the global minimum, in particular if C is not an exact minimizer. Standard practice would recommend to use the best solution found in “sufficiently many” replications of the base algorithm.

References

J. C. Bezdek (1981). Pattern recognition with fuzzy objective function algorithms. New York: Plenum.

See Also