Partitions a numeric data set by using Hard C-Means (HCM) clustering algorithm (or K-Means) which has been proposed by MacQueen(1967). The function hcm
is an extension of the basic kmeans
with more input arguments and output values in order to make the clustering results comparable with those of other fuzzy and possibilistic algorithms. For instance, not only the Euclidean distance metric but also a number of distance metrics such as the squared Euclidean distance, the squared Chord distance etc. can be employed with the function hcm
.
hcm(x, centers, dmetric="euclidean", pw=2, alginitv="kmpp",
nstart=1, iter.max=1000, con.val=1e-9, stand=FALSE, numseed)
a numeric vector, data frame or matrix.
an integer specifying the number of clusters or a numeric matrix containing the initial cluster centers.
a string for the distance metric. The default is euclidean for the squared Euclidean distances. See get.dmetrics
for the alternative options.
a number for the power of Minkowski distance calculation. The default is 2 if the dmetric
is minkowski.
a string for the initialization of cluster prototypes matrix. The default is kmpp for K-means++ initialization method (Arthur & Vassilvitskii, 2007). For the list of alternative options see get.algorithms
.
an integer for the number of starts for clustering. The default is 1.
an integer for the maximum number of iterations allowed. The default is 1000.
a number for the convergence value between the iterations. The default is 1e-09.
a logical flag to standardize data. Its default value is FALSE
. If its value is TRUE
, the data matrix x
is standardized.
a seeding number to set the seed of R's random number generator.
an object of class ‘ppclust’, which is a list consists of the following items:
a numeric matrix containing the processed data set.
a numeric matrix containing the final cluster prototypes (centers of clusters).
a numeric matrix containing the hard membership degrees of the data objects.
a numeric matrix containing the distances of objects to the final cluster prototypes.
an integer for the number of clusters.
a numeric vector containing the cluster labels of the data objects.
a numeric vector containing the number of objects in the clusters.
an integer for the index of start with the minimum objective functional.
an integer vector for the number of iterations in each start of the algorithm.
a numeric vector for the objective function values of each start of the algorithm.
a numeric vector for the execution time of each start of the algorithm.
a numeric vector containing the within-cluster sum of squares for each cluster.
a number for the between-cluster sum of squares.
a number for the total within-cluster sum of squares.
a number for the total sum of squares.
a logical value, TRUE
shows that x
data set contains the standardized values of raw data.
a string for the name of partitioning algorithm. It is ‘HCM’ with this function.
a string for the matched function call generating this ‘ppclust’ object.
Hard C-Means (HCM) clustering algorithm (or K-means) partitions a data set into k groups, so-called clusters. The objective function of HCM is:
\(J_{HCM}(\mathbf{X}; \mathbf{V})=\sum\limits_{i=1}^n d^2(\vec{x}_i, \vec{v}_j)\)
See ppclust-package
for the details about the terms in the above equation of \(J_{HCM}\).
The update equation for membership degrees is:
\(u_{ij} = \left\{ \begin{array}{rl} 1 & if \; d^2(\vec{x}_i, \vec{v}_j) = min_{1\leq l\leq k} \; (d^2(\vec{x}_i, \vec{v}_l)) \\ 0 & otherwise \end{array} \right. \)
The update equation for cluster prototypes is:
\(\vec{v}_{j} =\frac{\sum\limits_{i=1}^n u_{ij} \vec{x}_i}{\sum\limits_{i=1}^n u_{ij}} \;\;; {1\leq j\leq k}\)
Arthur, D. & Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding, in Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, p. 1027-1035. <http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf>
MacQueen, J.B. (1967). Some methods for classification and analysis of multivariate observations. In Proc. of 5th Berkeley Symp. on Mathematical Statistics and Probability, Berkeley, Univ. of California Press, 1: 281-297. <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.308.8619&rep=rep1&type=pdf>
kmeans
,
ekm
,
fcm
,
fcm2
,
fpcm
,
fpppcm
,
gg
,
gk
,
gkpfcm
,
pca
,
pcm
,
pcmr
,
pfcm
,
upfc
# NOT RUN {
# Load dataset iris
data(iris)
x <- iris[,-5]
# Initialize the prototype matrix using K-means++
v <- inaparc::kmpp(x, k=3)$v
# Run HCM with the initial prototypes
res.hcm <- hcm(x, centers=v)
# Print, summarize and plot the clustering result
res.hcm$cluster
summary(res.hcm$cluster)
plot(x, col=res.hcm$cluster, pch=16)
# }
Run the code above in your browser using DataLab