Learn R Programming

fpc (version 2.2-13)

cgrestandard: Standardise cluster validation statistics by random clustering results


Standardises cluster validity statistics as produced by clustatsum relative to results that were achieved by random clusterings on the same data by randomclustersim. The aim is to make differences between values comparable between indexes, see Hennig (2019), Akhanli and Hennig (2020).

This is mainly for use within clusterbenchstats.


                             useallg=FALSE, othernc=list())


List of class "valstat", see

valstat.object, with standardised results as explained above.



object of class "valstat", see clusterbenchstats.


list; output object of randomclustersim, see there.


vector of integers. Numbers of clusters to consider.


logical. If FALSE, standardisation is done to mean zero and standard deviation 1 using the random clusterings. If TRUE, the output is the percentage of simulated values below the result (more precisely, this number plus one divided by the total plus one).


logical. If FALSE, only random clustering results from clusim are used for standardisation. If TRUE, also clustering results from other methods as given in clusum are used.


logical. If TRUE, standardisation uses results from all numbers of clusters in G. If FALSE, standardisation of results for a specific number of cluster only uses results from that number of clusters.


list of integer vectors of length 2. This allows the incorporation of methods that bring forth other numbers of clusters than those in G, for example because a method may have automatically estimated a number of clusters. The first number is the number of the clustering method (the order is determined by argument clustermethod in clusterbenchstats), the second number is the number of clusters. Results specified here are only standardised in useallg=TRUE.


cgrestandard will add a statistic named dmode to the input set of validation statistics, which is defined as 0.75*dindex+0.25*highdgap, aggregating these two closely related statistics, see clustatsum.


Hennig, C. (2019) Cluster validation by measurement of clustering characteristics relevant to the user. In C. H. Skiadas (ed.) Data Analysis and Applications 1: Clustering and Regression, Modeling-estimating, Forecasting and Data Mining, Volume 2, Wiley, New York 1-24, https://arxiv.org/abs/1703.09282

Akhanli, S. and Hennig, C. (2020) Calibrating and aggregating cluster validity indexes for context-adapted comparison of clusterings. Statistics and Computing, 30, 1523-1544, https://link.springer.com/article/10.1007/s11222-020-09958-2, https://arxiv.org/abs/2002.01822

See Also

valstat.object, clusterbenchstats, stupidkcentroids, stupidknn, stupidkfn, stupidkaven, clustatsum


Run this code
  face <- rFace(10,dMoNo=2,dNoEy=0,p=2)
  dif <- dist(face)
  clusum <- list()
  clusum[[2]] <- list()
  cl12 <- kmeansCBI(face,2)
  cl13 <- kmeansCBI(face,3)
  cl22 <- claraCBI(face,2)
  cl23 <- claraCBI(face,2)
  ccl12 <- clustatsum(dif,cl12$partition)
  ccl13 <- clustatsum(dif,cl13$partition)
  ccl22 <- clustatsum(dif,cl22$partition)
  ccl23 <- clustatsum(dif,cl23$partition)
  clusum[[1]] <- list()
  clusum[[1]][[2]] <- ccl12
  clusum[[1]][[3]] <- ccl13
  clusum[[2]][[2]] <- ccl22
  clusum[[2]][[3]] <- ccl23
  clusum$maxG <- 3
  clusum$minG <- 2
  clusum$method <- c("kmeansCBI","claraCBI")
  clusum$name <- c("kmeansCBI","claraCBI")
  clusim <- randomclustersim(dist(face),G=2:3,nnruns=1,kmruns=1,
  cgr <- cgrestandard(clusum,clusim,2:3)
  cgr2 <- cgrestandard(clusum,clusim,2:3,useallg=TRUE)
  cgr3 <- cgrestandard(clusum,clusim,2:3,percentage=TRUE)

Run the code above in your browser using DataLab