Learn R Programming

cstab (version 0.2)

cStability: Selection of number of clusters via clustering instability

Description

Selection of number of clusters via model-based or model-free, normalized or unnormalized clustering instability.

Usage

cStability(data, kseq = 2:20, nB = 10, norm = TRUE, predict = TRUE,
  method = "kmeans", linkage = "complete", kmIter = 5, pbar = TRUE)

Arguments

data

a n x p data matrix of type numeric.

kseq

a vector with considered numbers clusters k > 1

nB

an integer specifying the number of bootstrap comparisons.

norm

logical specifying whether the instability path should be normalized. If TRUE, the instability path is normalized, accounting for a trivial decrease in instability due to a increasing k (see Haslbeck & Wulff, 2016).

predict

boolean specifying whether the model-based or the model-free variant should be used (see Haslbeck & Wulff, 2016).

method

character string specifying the clustering algorithm. 'kmeans' for the k-means algorithm, 'hierarchical' for hierarchical clustering.

linkage

character specifying the linkage criterion, in case type='hierarchical'. The available options are "single", "complete", "average", "mcquitty", "ward.D", "ward.D2", "centroid" or "median". See hclust.

kmIter

integer specifying the the number of restarts of the k-means algorithm in order to avoid local minima.

pbar

logical

Value

A list that contains the optimal k selected by the unnormalized and normalized instability method. It also includes a vector containing the averaged instability path (over bootstrap samples) and a matrix containing the instability path of each bootstrap sample for both the normalized and the unnormalized method.

References

Ben-Hur, A., Elisseeff, A., & Guyon, I. (2001). A stability based method for discovering structure in clustered data. Pacific symposium on biocomputing, 7, 6-17.

Tibshirani, R., & Walther, G. (2005). Cluster validation by prediction strength. Journal of Computational and Graphical Statistics, 14(3), 511-528.

Haslbeck, J., & Wulff, D. U. (2016). Estimating the Number of Clusters via Normalized Cluster Instability. arXiv preprint arXiv:1608.07494.

Examples

Run this code

  # Generate Data from Gaussian Mixture
  s <- .1
  n <- 50
  data <- rbind(cbind(rnorm(n, 0, s), rnorm(n, 0, s)),
                cbind(rnorm(n, 1, s), rnorm(n, 1, s)),
                cbind(rnorm(n, 0, s), rnorm(n, 1, s)),
                cbind(rnorm(n, 1, s), rnorm(n, 0, s)))
  plot(data)

  # Selection of Number of Clusters using Instability-based Measures
  stab_obj <- cStability(data, kseq=2:10)
  print(stab_obj)
  

Run the code above in your browser using DataLab