randomclustersim: Simulation of validity indexes based on random clusterings

Description

For a given dataset this simulates random clusterings using stupidkcentroids and stupidknn. It then computes and stores a set of cluster validity indexes for every clustering.

Usage

randomclustersim(datadist,datanp=NULL,npstats=FALSE,
                      G,nnruns=100,kmruns=100,nnk=4,dnnk=2,
                      pamcrit=TRUE, 
                      multicore=FALSE,cores=detectCores()-1,monitor=TRUE)

Arguments

datadist

distances on which validation-measures are based, dist object or distance matrix.

datanp

optional observations times variables data matrix, see npstats.

npstats

logical. If TRUE, distrsimilarity is called and the two statistics computed there are added to the output. These are based on datanp and require datanp to be specified.

vector of integers. Numbers of clusters to consider.

nnruns

integer. Number of runs of stupidknn.

kmruns

integer. Number of runs of stupidkcentroids.

nnk

nnk-argument to be passed on to cqcluster.stats.

dnnk

nnk-argument to be passed on to distrsimilarity.

pamcrit

pamcrit-argument to be passed on to cqcluster.stats.

multicore

logical. If TRUE, parallel computing is used through the function mclapply from package parallel; read warnings there if you intend to use this; it won't work on Windows.

cores

integer. Number of cores for parallelisation.

monitor

logical. If TRUE, it will print some runtime information.

Value

List with components

list, indexed by number of clusters. Every entry is a data frame with kmruns observations for every simulation run of stupidkcentroids. The variables of the data frame are avewithin, mnnd, cvnnd, maxdiameter, widestgap, sindex, minsep, asw, dindex, denscut, highdgap, pearsongamma, withinss, entropy, if pamcrit=TRUE also pamc, if npstats=TRUE also kdnorm, kdunif. All these are cluster validation indexes; documented as values of clustatsum.

list, indexed by number of clusters. Every entry is a data frame with nnruns observations for every simulation run of stupidknn. The variables of the data frame are avewithin, mnnd, cvnnd, maxdiameter, widestgap, sindex, minsep, asw, dindex, denscut, highdgap, pearsongamma, withinss, entropy, if pamcrit=TRUE also pamc, if npstats=TRUE also kdnorm, kdunif. All these are cluster validation indexes; documented as values of clustatsum.

nnruns

number of involved runs of stupidknn,

kmruns

number of involved runs of stupidkcentroids,

References

Hennig, C. (2017) Cluster validation by measurement of clustering characteristics relevant to the user. In C. H. Skiadas (ed.) Proceedings of ASMDA 2017, 501-520, https://arxiv.org/abs/1703.09282

Examples

Run this code

# NOT RUN {
  set.seed(20000)
  options(digits=3)
  face <- rFace(10,dMoNo=2,dNoEy=0,p=2)
  randomclustersim(dist(face),datanp=face,npstats=TRUE,G=2:3,nnruns=3,kmruns=3)
# }

Run the code above in your browser using DataLab