For a given dataset this simulates random clusterings using
stupidkcentroids
, stupidknn
,
stupidkfn
, and stupidkaven
. It then
computes and stores a set of cluster validity indexes for every
clustering.
randomclustersim(datadist,datanp=NULL,npstats=FALSE,useboot=FALSE,
bootmethod="nselectboot",
bootruns=25,
G,nnruns=100,kmruns=100,fnruns=100,avenruns=100,
nnk=4,dnnk=2,
pamcrit=TRUE,
multicore=FALSE,cores=detectCores()-1,monitor=TRUE)
List with components
list, indexed by number of clusters. Every entry is
a data frame with nnruns
observations for every simulation
run of stupidknn
. The variables of the data frame are
avewithin, mnnd,
cvnnd, maxdiameter, widestgap, sindex, minsep, asw, dindex,
denscut, highdgap, pearsongamma, withinss, entropy
, if
pamcrit=TRUE
also pamc
, if npstats=TRUE
also
kdnorm, kdunif
. All these are cluster validation indexes;
documented as values of clustatsum
.
list, indexed by number of clusters. Every entry is
a data frame with fnruns
observations for every simulation
run of stupidkfn
. The variables of the data frame are
avewithin, mnnd,
cvnnd, maxdiameter, widestgap, sindex, minsep, asw, dindex,
denscut, highdgap, pearsongamma, withinss, entropy
, if
pamcrit=TRUE
also pamc
, if npstats=TRUE
also
kdnorm, kdunif
. All these are cluster validation indexes;
documented as values of clustatsum
.
list, indexed by number of clusters. Every entry is
a data frame with avenruns
observations for every simulation
run of stupidkaven
. The variables of the data frame are
avewithin, mnnd,
cvnnd, maxdiameter, widestgap, sindex, minsep, asw, dindex,
denscut, highdgap, pearsongamma, withinss, entropy
, if
pamcrit=TRUE
also pamc
, if npstats=TRUE
also
kdnorm, kdunif
. All these are cluster validation indexes;
documented as values of clustatsum
.
list, indexed by number of clusters. Every entry is
a data frame with kmruns
observations for every simulation
run of stupidkcentroids
. The variables of the data
frame are avewithin, mnnd,
cvnnd, maxdiameter, widestgap, sindex, minsep, asw, dindex,
denscut, highdgap, pearsongamma, withinss, entropy
, if
pamcrit=TRUE
also pamc
, if npstats=TRUE
also
kdnorm, kdunif
. All these are cluster validation indexes;
documented as values of clustatsum
.
number of involved runs of stupidknn
,
number of involved runs of stupidkfn
,
number of involved runs of stupidkaven
,
number of involved runs of stupidkcentroids
,
if useboot=TRUE
, stability value; stabk
for
method nselectboot
; mean.pred
for method
prediction.strength
.
distances on which validation-measures are based, dist
object or distance matrix.
optional observations times variables data matrix, see
npstats
.
logical. If TRUE
, distrsimilarity
is called and the two statistics computed there are added to the
output. These are based on datanp
and require datanp
to be specified.
logical. If TRUE
, a stability index (either
nselectboot
or prediction.strength
) will be involved.
either "nselectboot"
or
"prediction.strength"
; stability index to be used if
useboot=TRUE
.
integer. Number of resampling runs. If
useboot=TRUE
, passed on as B
to nselectboot
or
M
to prediction.strength
.
vector of integers. Numbers of clusters to consider.
integer. Number of runs of stupidknn
.
integer. Number of runs of stupidkcentroids
.
integer. Number of runs of stupidkfn
.
integer. Number of runs of stupidkaven
.
nnk
-argument to be passed on to
cqcluster.stats
.
nnk
-argument to be passed on to
distrsimilarity
.
pamcrit
-argument to be passed on to
cqcluster.stats
.
logical. If TRUE
, parallel computing is used
through the function mclapply
from package
parallel
; read warnings there if you intend to use this; it
won't work on Windows.
integer. Number of cores for parallelisation.
logical. If TRUE
, it will print some runtime
information.
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en/
Hennig, C. (2019) Cluster validation by measurement of clustering characteristics relevant to the user. In C. H. Skiadas (ed.) Data Analysis and Applications 1: Clustering and Regression, Modeling-estimating, Forecasting and Data Mining, Volume 2, Wiley, New York 1-24, https://arxiv.org/abs/1703.09282
Akhanli, S. and Hennig, C. (2020) Calibrating and aggregating cluster validity indexes for context-adapted comparison of clusterings. Statistics and Computing, 30, 1523-1544, https://link.springer.com/article/10.1007/s11222-020-09958-2, https://arxiv.org/abs/2002.01822
stupidkcentroids
, stupidknn
, stupidkfn
, stupidkaven
, clustatsum
set.seed(20000)
options(digits=3)
face <- rFace(10,dMoNo=2,dNoEy=0,p=2)
rmx <- randomclustersim(dist(face),datanp=face,npstats=TRUE,G=2:3,
nnruns=2,kmruns=2, fnruns=1,avenruns=1,nnk=2)
if (FALSE) {
rmx$km # Produces slightly different but basically identical results on ATLAS
}
rmx$aven
rmx$fn
rmx$nn
Run the code above in your browser using DataLab