Learn R Programming

e1071 (version 1.3-4)

scaclust: Fuzzy Clustering using Scatter Matrices

Description

Four fuzzy clustering methods, namely the Adaptive distances method, the Minimum total volume method, the Sum of all normalized determinants and the the Maximum likelihood method (Product of Determinants) that are based on the calculation of the scatter matrices.

Usage

scaclust(x, centers, iter.max=100, verbose=FALSE, method="ad",
         theta = NULL)

Arguments

x
The data matrix, where the columns correspond to the variables and the rows to the observations.
centers
Number of clusters or initial values for cluster centers
iter.max
Maximum number of iterations
verbose
If TRUE, make some output during learning
method
If "ad", then we have the Adaptive distances method, if "mtv" the Minimum total volume method, if "sand" the Sum of all normalized determinants method and if "mlm" the Maximum likelihood meth
theta
A set of constraints for each cluster

Value

  • scaclust returns an object of class "fclust".
  • centersThe final cluster centers.
  • sizeThe number of data points in each cluster.
  • clusterVector containing the indices of the clusters where the data points are assigned to. The maximum membership value of a point is considered for partitioning it to a cluster.
  • iterThe number of iterations performed.
  • membershipa matrix with the membership values of the data points to the clusters.
  • withinerrorReturns the value of the error function.
  • callReturns a call in which all of the arguments are specified by their names.

Details

The data given by x is clustered by 4 fuzzy algorithms based on the scatter matrices computation. If centers is a matrix, its rows are taken as the initial cluster centers. If centers is an integer, centers rows of x are randomly chosen as initial values. The algorithm stops when the maximum number of iterations (given by iter.max) is reached.

If verbose is TRUE, it displays for each iteration the number the value of the objective function.

If method is "ad", then we have the Adaptive distances method, if "mtv" the Minimum total volume method, if "sand" the Sum of all normalized determinants method and if "mlm" the Maximum likelihood method (Product of Determinants). Note that all these algorithms are adapted for a fuzzification parameter of a value 2.

theta is by default 1.0 for every cluster. The relative volumes of the clusters are constrained a priori by these constants. An inappropriate choice can lead to a bad clustering. The Maximum likelihood method does not need this parameter.

References

P. J. Rousseeuw, L. Kaufman, and E. Trauwaert. Fuzzy Clustering using Scatter Matrices. Computational Statistics & Data Analysis, vol.23, p.135-151, 1996.

Examples

Run this code
## a 2-dimensional example
x<-rbind(matrix(rnorm(100,sd=0.3),ncol=2),
         matrix(rnorm(100,mean=1,sd=0.3),ncol=2))
cl<-scaclust(x,2,20,verbose=TRUE,method="ad")
print(cl)

Run the code above in your browser using DataLab