Learn R Programming

broom (version 0.3.4)

kmeans_tidiers: Tidying methods for kmeans objects

Description

These methods summarize the results of k-means clustering into three tidy forms. tidy describes the center and size of each cluster, augment adds the cluster assignments to the original data, and glance summarizes the total within and between sum of squares of the clustering.

Usage

## S3 method for class 'kmeans':
tidy(x, col.names = paste0("x", 1:ncol(x$centers)), ...)

## S3 method for class 'kmeans': augment(x, data, ...)

## S3 method for class 'kmeans': glance(x, ...)

Arguments

x
kmeans object
col.names
The names to call each dimension of the data in tidy. Defaults to x1, x2...
...
extra arguments, not used
data
Original data (required for augment)

Value

  • All tidying methods return a data.frame without rownames. The structure depends on the method chosen.

    tidy returns one row per cluster, with one column for each dimension in the data describing the center, followed by

  • sizeThe size of each cluster
  • withinssThe within-cluster sum of squares
  • clusterA factor describing the cluster from 1:k
  • augment returns the original data with one extra column:
  • .clusterThe cluster assigned by the k-means algorithm
  • glance returns a one-row data.frame with the columns
  • totssThe total sum of squares
  • tot.withinssThe total within-cluster sum of squares
  • betweenssThe total between-cluster sum of squares
  • iterThe numbr of (outer) iterations

See Also

kmeans

Examples

Run this code
library(dplyr)
library(ggplot2)

set.seed(2014)
centers <- data.frame(cluster=factor(1:3), size=c(100, 150, 50),
                      x1=c(5, 0, -3), x2=c(-1, 1, -2))
points <- centers %>% group_by(cluster) %>%
 do(data.frame(x1=rnorm(.$size[1], .$x1[1]),
               x2=rnorm(.$size[1], .$x2[1])))

k <- kmeans(points %>% dplyr::select(x1, x2), 3)
tidy(k)
head(augment(k, points))
glance(k)

ggplot(augment(k, points), aes(x1, x2)) +
    geom_point(aes(color = .cluster)) +
    geom_text(aes(label = cluster), data = tidy(k), size = 10)

Run the code above in your browser using DataLab