Learn R Programming

sjPlot (version 2.1.0)

sjc.kgap: Compute gap statistics for k-means-cluster

Description

An implementation of the gap statistic algorithm from Tibshirani, Walther, and Hastie's "Estimating the number of clusters in a data set via the gap statistic". This function calls the clusGap-function of the cluster-package to calculate the data for the plot.

Usage

sjc.kgap(x, max = 10, B = 100, SE.factor = 1, method = "Tibs2001SEmax", plotResults = TRUE)

Arguments

x
matrix, where rows are observations and columns are individual dimensions, to compute and plot the gap statistic (according to a uniform reference distribution).
max
maximum number of clusters to consider, must be at least two. Default is 10.
B
integer, number of Monte Carlo ("bootstrap") samples. Default is 100.
SE.factor
[When method contains "SE"] Determining the optimal number of clusters, Tibshirani et al. proposed the "1 S.E."-rule. Using an SE.factor f, the "f S.E."-rule is used, more generally.
method
character string indicating how the "optimal" number of clusters, k^, is computed from the gap statistics (and their standard deviations), or more generally how the location k^ of the maximum of f[k] should be determined. Default is "Tibs2001SEmax". Possible value are:
plotResults
logical, if TRUE (default), a graph visualiting the gap statistic will be plotted. Use FALSE to omit the plot.

Value

An object containing the used data frame for plotting, the ggplot object and the number of found cluster.

References

  • Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via gap statistic. J. R. Statist. Soc. B, 63, Part 2, pp. 411-423
  • Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., Hornik, K.(2013). cluster: Cluster Analysis Basics and Extensions. R package version 1.14.4. (web)

See Also

sjc.elbow

Examples

Run this code
## Not run: 
# # plot gap statistic and determine best number of clusters
# # in mtcars dataset
# sjc.kgap(mtcars)
# 
# # and in iris dataset
# sjc.kgap(iris[,1:4])## End(Not run)

Run the code above in your browser using DataLab