Initializes the cluster prototypes matrix using the Hartigan-Wong's algorithm (Hartigan & Wong, 1979).
hartiganwong(x, k)
a numeric vector, data frame or matrix.
an integer specifying the number of clusters.
an object of class ‘inaparc’, which is a list consists of the following items:
a numeric matrix containing the initial cluster prototypes.
a string for the type of used centroid to determine the cluster prototypes. It is ‘obj’ with this function because the generated prototype matrix contains the selected objects.
a string containing the matched function call that generates this ‘inaparc’ object.
Firstly, the algorithm computes the center of gravity of data and the distances of data objects to this center. Then, it sorts the data set in any order of the computed distances. The prototypes of k clusters are determined by using the formula (\(1 + (i-1) n/k)\)), where i and n stand for the index of a cluster and the number of data rows, respectively. This algorithm leads to increase in the computational cost due to complexity of sorting, which is \(O(n\;log(n))\) (Celebi et al, 2013).
Hartigan, J.A. & Wong, W.A., (1979). Algorithm AS 136: A K-means clustering algorithm, J of the Royal Statistical Society, C 28 (1): 100-108.
Celebi, M.E., Kingravi, H.A. & Vela, P.A. (2013). A comparative study of efficient initialization methods for the K-means clustering algorithm, Expert Systems with Applications, 40 (1): 200-210. arXiv:https://arxiv.org/pdf/1209.1960.pdf
aldaoud
,
ballhall
,
crsamp
,
firstk
,
forgy
,
inofrep
,
inscsf
,
insdev
,
kkz
,
kmpp
,
ksegments
,
ksteps
,
lastk
,
lhsmaximin
,
lhsrandom
,
maximin
,
mscseek
,
rsamp
,
rsegment
,
scseek
,
scseek2
,
spaeth
,
ssamp
,
topbottom
,
uniquek
,
ursamp
# NOT RUN {
data(iris)
res <- hartiganwong(iris[,1:4], k=5)
v <- res$v
print(v)
# }
Run the code above in your browser using DataLab