Learn R Programming

amap (version 0.8-20)

Kmeans: K-Means Clustering

Description

Perform k-means clustering on a data matrix.

Usage

Kmeans(x, centers, iter.max = 10, nstart = 1,
         method = "euclidean")

Value

A list with components:

cluster

A vector of integers indicating the cluster to which each point is allocated.

centers

A matrix of cluster centres.

withinss

The within-cluster sum of square distances for each cluster.

size

The number of points in each cluster.

Arguments

x

A numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns). Or an object of class "exprSet".

centers

Either the number of clusters or a set of initial cluster centers. If the first, a random set of rows in x are chosen as the initial centers.

iter.max

The maximum number of iterations allowed.

nstart

If centers is a number, how many random sets should be chosen?

method

the distance measure to be used. This must be one of "euclidean", "maximum", "manhattan", "canberra", "binary", "pearson" , "abspearson" , "abscorrelation", "correlation", "spearman" or "kendall". Any unambiguous substring can be given.

Details

The data given by x is clustered by the k-means algorithm. When this terminates, all cluster centres are at the mean of their Voronoi sets (the set of data points which are nearest to the cluster centre).

The algorithm of Lloyd--Forgy is used; method="euclidean" should return same result as with function kmeans.

See Also

hcluster,kmeans.

Examples

Run this code

## a 2-dimensional example
x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
           matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(x) <- c("x", "y")
(cl <- Kmeans(x, 2))
plot(x, col = cl$cluster)
points(cl$centers, col = 1:2, pch = 8, cex=2)

## random starts do help here with too many clusters
(cl <- Kmeans(x, 5, nstart = 25))
plot(x, col = cl$cluster)
points(cl$centers, col = 1:5, pch = 8)


Kmeans(x, 5,nstart = 25, method="abscorrelation")


Run the code above in your browser using DataLab