Learn R Programming

e1071 (version 1.6-7)

bclust: Bagged Clustering

Description

Cluster the data in x using the bagged clustering algorithm. A partitioning cluster algorithm such as kmeans is run repeatedly on bootstrap samples from the original data. The resulting cluster centers are then combined using the hierarchical cluster algorithm hclust.

Usage

bclust(x, centers=2, iter.base=10, minsize=0, dist.method="euclidian", hclust.method="average", base.method="kmeans", base.centers=20, verbose=TRUE, final.kmeans=FALSE, docmdscale=FALSE, resample=TRUE, weights=NULL, maxcluster=base.centers, ...) hclust.bclust(object, x, centers, dist.method=object$dist.method, hclust.method=object$hclust.method, final.kmeans=FALSE, docmdscale = FALSE, maxcluster=object$maxcluster) "plot"(x, maxcluster=x$maxcluster, main, ...) centers.bclust(object, k) clusters.bclust(object, k, x=NULL)

Arguments

x
Matrix of inputs (or object of class "bclust" for plot).
centers, k
Number of clusters.
iter.base
Number of runs of the base cluster algorithm.
minsize
Minimum number of points in a base cluster.
dist.method
Distance method used for the hierarchical clustering, see dist for available distances.
hclust.method
Linkage method used for the hierarchical clustering, see hclust for available methods.
base.method
Partitioning cluster method used as base algorithm.
base.centers
Number of centers used in each repetition of the base method.
verbose
Output status messages.
final.kmeans
If TRUE, a final kmeans step is performed using the output of the bagged clustering as initialization.
docmdscale
Logical, if TRUE a cmdscale result is included in the return value.
resample
Logical, if TRUE the base method is run on bootstrap samples of x, else directly on x.
weights
Vector of length nrow(x), weights for the resampling. By default all observations have equal weight.
maxcluster
Maximum number of clusters memberships are to be computed for.
object
Object of class "bclust".
main
Main title of the plot.
...
Optional arguments top be passed to the base method in bclust, ignored in plot.

Value

bclust and hclust.bclust return objects of class "bclust" including the components
hclust
Return value of the hierarchical clustering of the collection of base centers (Object of class "hclust").
cluster
Vector with indices of the clusters the inputs are assigned to.
centers
Matrix of centers of the final clusters. Only useful, if the hierarchical clustering method produces convex clusters.
allcenters
Matrix of all iter.base * base.centers centers found in the base runs.

Details

First, iter.base bootstrap samples of the original data in x are created by drawing with replacement. The base cluster method is run on each of these samples with base.centers centers. The base.method must be the name of a partitioning cluster function returning a list with the same components as the return value of kmeans.

This results in a collection of iter.base * base.centers centers, which are subsequently clustered using the hierarchical method hclust. Base centers with less than minsize points in there respective partitions are removed before the hierarchical clustering.

The resulting dendrogram is then cut to produce centers clusters. Hence, the name of the argument centers is a little bit misleading as the resulting clusters need not be convex, e.g., when single linkage is used. The name was chosen for compatibility with standard partitioning cluster methods such as kmeans.

A new hierarchical clustering (e.g., using another hclust.method) re-using previous base runs can be performed by running hclust.bclust on the return value of bclust.

References

Friedrich Leisch. Bagged clustering. Working Paper 51, SFB ``Adaptive Information Systems and Modeling in Economics and Management Science'', August 1999. http://epub.wu.ac.at/1272/1/document.pdf

See Also

hclust, kmeans, boxplot.bclust

Examples

Run this code
data(iris)
bc1 <- bclust(iris[,1:4], 3, base.centers=5)
plot(bc1)

table(clusters.bclust(bc1, 3))
centers.bclust(bc1, 3)

Run the code above in your browser using DataLab