Learn R Programming

broom (version 0.4.5)

kmeans_tidiers: Tidying methods for kmeans objects

Description

These methods summarize the results of k-means clustering into three tidy forms. tidy describes the center and size of each cluster, augment adds the cluster assignments to the original data, and glance summarizes the total within and between sum of squares of the clustering.

Usage

# S3 method for kmeans
tidy(x, col.names = paste0("x", 1:ncol(x$centers)), ...)

# S3 method for kmeans augment(x, data, ...)

# S3 method for kmeans glance(x, ...)

Arguments

x

kmeans object

col.names

The names to call each dimension of the data in tidy. Defaults to x1, x2...

...

extra arguments, not used

data

Original data (required for augment)

Value

All tidying methods return a data.frame without rownames. The structure depends on the method chosen.

tidy returns one row per cluster, with one column for each dimension in the data describing the center, followed by

size

The size of each cluster

withinss

The within-cluster sum of squares

cluster

A factor describing the cluster from 1:k

augment returns the original data with one extra column:

.cluster

The cluster assigned by the k-means algorithm

glance returns a one-row data.frame with the columns

totss

The total sum of squares

tot.withinss

The total within-cluster sum of squares

betweenss

The total between-cluster sum of squares

iter

The numbr of (outer) iterations

See Also

kmeans

Examples

Run this code
# NOT RUN {
library(dplyr)
library(ggplot2)

set.seed(2014)
centers <- data.frame(cluster=factor(1:3), size=c(100, 150, 50),
                      x1=c(5, 0, -3), x2=c(-1, 1, -2))
points <- centers %>% group_by(cluster) %>%
 do(data.frame(x1=rnorm(.$size[1], .$x1[1]),
               x2=rnorm(.$size[1], .$x2[1])))

k <- kmeans(points %>% dplyr::select(x1, x2), 3)
tidy(k)
head(augment(k, points))
glance(k)

ggplot(augment(k, points), aes(x1, x2)) +
    geom_point(aes(color = .cluster)) +
    geom_text(aes(label = cluster), data = tidy(k), size = 10)

# }

Run the code above in your browser using DataLab