kmeanspp: K-means++ Clustering

Description

kmeans++ clustering algorithm

Usage

kmeanspp(X, k)

Arguments

numeric matrix of data.

the number of clusters.

Value

Returns an `object' of class ``kmeans'', because kmeans will be called in the end.

Details

kmeanspp applies a specific way of choosing the centers that will be passed to the classical kmeans routine. The first center will be chosen at random, the next ones will be selected with a probability proportional to the shortest distance to the closest center already chosen.

References

Arthur, D., and S. Vassilvitskii (2006). "k-means++: The Advantages of Careful Seeding", Technical Report 2006-13, Stanford InfoLab.

Examples

Run this code

X <- rbind(matrix(rnorm(500, mean = 0,  sd = 0.3), ncol = 2),
           matrix(rnorm(500, mean = 1,  sd = 0.3), ncol = 2),
           matrix(rnorm(500, mean = -1, sd = 0.3), ncol = 2))
colnames(X) <- c("x", "y")
cl <- kmeanspp(X, 3)
plot(X, col = cl$cluster)
points(cl$centers, col = 1:3)
grid()

Run the code above in your browser using DataLab