Initializes the cluster prototypes matrix by using K-means++ algorithm which has been proposed by Arthur and Vassilvitskii (2007).
kmpp(x, k)
a numeric vector, data frame or matrix.
an integer specifying the number of clusters.
an object of class ‘inaparc’, which is a list consists of the following items:
a numeric matrix containing the initial cluster prototypes.
a string representing the type of centroid, which used to build prototype matrix. Its value is ‘obj’ with this function because the cluster prototypes are the objects selected by the algorithm.
a string containing the matched function call that generates this sQuoteinaparc object.
K-means++ (Arthur & Vassilvitskii, 2007) is usually reported as an efficient approximation algorithm in overcoming the poor clustering problem with the standard K-means algorithm. K-means++ is an algorithm that merges MacQueen's second method with the ‘Maximin’ method to initialize the cluster prototypes (Ji et al, 2015). K-means++ initializes the cluster centroids by finding the data objects that are farther away from each other in a probabilistic manner. In K-means++, the first cluster protoype (center) is randomly assigned. The prototypes of remaining clusters are determined with a probability of \({md(x')}^2/\sum_{k=1}^{n} md({x_k})^2\), where \(md(x)\) is the minimum distance between a data object and the previously computed prototypes.
The function kmpp
is an implementation of the initialization algorithm of K-means++ that is based on the code‘k-meansp2.R’, authored by M. Sugiyama. It needs less execution time due to its vectorized distance computations.
Arthur, D. & Vassilvitskii. S. (2007). K-means++: The advantages of careful seeding, in Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, p. 1027-1035. url:http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
M. Sugiyama, ‘mahito-sugiyama/k-meansp2.R’. url:https://gist.github.com/mahito-sugiyama/ef54a3b17fff4629f106
aldaoud
,
ballhall
,
crsamp
,
firstk
,
forgy
,
hartiganwong
,
inofrep
,
inscsf
,
insdev
,
kkz
,
ksegments
,
ksteps
,
lastk
,
lhsmaximin
,
lhsrandom
,
maximin
,
mscseek
,
rsamp
,
rsegment
,
scseek
,
scseek2
,
spaeth
,
ssamp
,
topbottom
,
uniquek
,
ursamp
# NOT RUN {
data(iris)
res <- kmpp(x=iris[,1:4], k=5)
v <- res$v
print(v)
# }
Run the code above in your browser using DataLab