Obtain prototype-based partitions of objects by minimizing the criterion \(\sum w_b u_{bj}^m d(x_b, p_j)^e\), the sum of the case-weighted and membership-weighted \(e\)-th powers of the dissimilarities between the objects \(x_b\) and the prototypes \(p_j\), for suitable dissimilarities \(d\) and exponents \(e\).
pclust(x, k, family, m = 1, weights = 1, control = list())
pclust_family(D, C, init = NULL, description = NULL, e = 1,
.modify = NULL, .subset = NULL)
pclust_object(prototypes, membership, cluster, family, m = 1,
value, ..., classes = NULL, attributes = NULL)
pclust
returns the partition found as an object of class
"pclust"
(as obtained by calling pclust_object
) which in
addition to the default components contains call
(the
matched call) and a converged
attribute indicating convergence
status (i.e., whether the maximal number of iterations was reached).
pclust_family
returns an object of class
"pclust_family"
, which is a list with components corresponding
to the formals of pclust_family
.
pclust_object
returns an object inheriting from class
"pclust"
, which is a list with components corresponding
to the formals (up to and including ...
) of
pclust_object
, and additional classes and attributes specified
by classes
and attributes
, respectively.
the object to be partitioned.
an integer giving the number of classes to be used in the partition.
an object of class "pclust_family"
as generated
by pclust_family
, containing the information about \(d\)
and \(e\).
a number not less than 1 controlling the softness of the partition (as the “fuzzification parameter” of the fuzzy \(c\)-means algorithm). The default value of 1 corresponds to hard partitions obtained from a generalized \(k\)-means problem; values greater than one give partitions of increasing softness obtained from a generalized fuzzy \(c\)-means problem.
a numeric vector of non-negative case weights.
Recycled to the number of elements given by x
if necessary.
a list of control parameters. See Details.
a function for computing dissimilarities \(d\) between objects and prototypes.
a ‘consensus’ function with formals x
,
weights
and control
for computing a consensus
prototype \(p\) minimizing \(\sum_b w_b d(x_b, p) ^ e\).
a function with formals x
and k
initializing
an object with \(k\) prototypes from the object x
to be
partitioned.
a character string describing the family.
a number giving the exponent \(e\) of the criterion.
a function with formals x
, i
and
value
for modifying a single prototype,
or NULL
(default).
a function with formals x
and i
for
subsetting prototypes,
or NULL
(default).
an object representing the prototypes of the partition.
an object of class "cl_membership"
with the membership values \(u_{bj}\).
the class ids of the nearest hard partition.
the value of the criterion to be minimized.
further elements to be included in the generated pclust object.
a character vector giving further classes to be given
to the generated pclust object in addition to "pclust"
, or
NULL
(default).
a list of attributes, or NULL
(default).
For \(m = 1\), a generalization of the Lloyd-Forgy variant of the
\(k\)-means algorithm is used, which iterates between reclassifying
objects to their closest prototypes (according to the dissimilarities
given by D
), and computing new prototypes as the consensus for
the classes (using C
).
For \(m > 1\), a generalization of the fuzzy \(c\)-means recipe (e.g., Bezdek (1981)) is used, which alternates between computing optimal memberships for fixed prototypes, and computing new prototypes as the suitably weighted consensus clusterings for the classes.
This procedure is repeated until convergence occurs, or the maximal number of iterations is reached.
Currently, no local improvement heuristics are provided.
It is possible to perform several runs of the procedure via control
arguments nruns
or start
(the default is to perform a
single run), in which case the first partition with the smallest
value of the criterion is returned.
The dissimilarity and consensus functions as well as the exponent
\(e\) are specified via family
. In principle, arbitrary
representations of objects to be partitioned and prototypes (which do
not necessarily have to be “of the same kind”) can be employed.
In addition to D
and C
, what is needed are means to
obtain an initial collection of \(k\) prototypes (init
), to
modify a single prototype (.modify
), and subset the prototypes
(.subset
). By default, list and (currently, only dense) matrix
(with the usual convention that the rows correspond to the objects)
are supported. Otherwise, the family has to provide the functions
needed.
Available control parameters are as follows.
maxiter
an integer giving the maximal number of iterations to be performed. Defaults to 100.
nruns
an integer giving the number of runs to be performed. Defaults to 1.
reltol
the relative convergence tolerance.
Defaults to sqrt(.Machine$double.eps)
.
start
a list of prototype objects to be used as starting values.
verbose
a logical indicating whether to provide
some output on minimization progress.
Defaults to getOption("verbose")
.
control
control parameters to be used in the consensus function.
The fixed point approach employed is a heuristic which cannot be
guaranteed to find the global minimum, in particular if C
is
not an exact minimizer. Standard practice would recommend to use the
best solution found in “sufficiently many” replications of the
base algorithm.
J. C. Bezdek (1981). Pattern recognition with fuzzy objective function algorithms. New York: Plenum.