bootclustrange: Cluster Quality Indices estimation by subsampling

Description

bootclustrange estimates the quality of the clustering based on subsamples of the data to avoid computational overload.

Usage

bootclustrange(object, seqdata, seqdist.args = list(method = "LCS"),
               R = 100, sample.size = 1000, parallel = FALSE,
               progressbar = FALSE, sampling = "clustering",
               strata = NULL)
# S3 method for bootclustrange
plot(x, stat = "noCH", legendpos = "bottomright",
                              norm = "none", withlegend = TRUE, lwd = 1,
                              col = NULL, ylab = "Indicators", 
                              xlab = "N clusters", conf.int = 0.95, 
                              ci.method = "perc", ci.alpha = 0.3, 
                              line = "median", ...)
# S3 method for bootclustrange
print(x, digits = 2, bootstat = c("mean"), ...)

Value

A clustrange object, see as.clustrange with the bootrapped values.

Arguments

object: A seqclararange object or a data.frame with the clustering to be evaluated.
seqdata: State sequence object of class stslist. The sequence data to use. Use seqdef to create such an object.
seqdist.args: List of arguments passed to seqdist for computing the distances.
R: Numeric. The number of subsamples to use.
sample.size: Numeric. The size of the subsamples, values between 1000 and 10 000 are recommended.
parallel: Logical. Whether to initialize the parallel processing of the future package using the default multisession strategy. If FALSE (default), then the current plan is used. If TRUE, multisession plan is initialized using default values.
progressbar: Logical. Whether to initialize a progressbar using the future package. If FALSE (default), then the current progress bar handlers is used . If TRUE, a new global progress bar handlers is initialized.
sampling: Character. The sampling procedure to be used: "clustering" (default) the sampling is stratified by the maximum number of clusters, use "medoids" to add the medoids in each subsamples, "strata" to stratify by the strata arguments, or "random" for random sampling.
strata: An optional stratification variable.
x: A bootclustrange object to be plotted or printed.
stat: Character. The list of statistics to plot or "noCH" to plot all statistics except "CH" and "CHsq" or "all" for all statistics. See as.clustrange for a list of possible values.
legendpos: Character. legend position, see legend.
norm: Character. Normalization method of the statistics can be one of "none" (no normalization), "range" (given as (value -min)/(max-min), "zscore" (adjusted by mean and standard deviation) or "zscoremed" (adjusted by median and median of the difference to the median).
withlegend: Logical. If FALSE, the legend is not plotted.
lwd: Numeric. Line width, see par.
col: A vector of line colors, see par. If NULL, a default set of color is used.
xlab: x axis label.
ylab: y axis label.
conf.int: Confidence to build the confidence interval (default: 0.95).
ci.method: Method used to build the confidence interval (only if bootstrap has been used, see R above). One of "none" (do not plot confidence interval), "norm" (based on normal approximation), "perc" (default, based on percentile).)
ci.alpha: alpha color value used to plot the interval.
line: Which value should be plotted by the line? One of "mean" (average over all bootstraps), "median"(default, median over all bootstraps).
digits: Number of digits to be printed.
bootstat: The summary statistic to use "mean" or "median".
...: Additionnal parameters passed to/from methods.

Details

bootclustrange estimates the quality of the clustering based on subsamples of the data to avoid computational overload. It randomly samples R times sample.size sequences from seqdata using the sampling procedure defined by the sampling arguments. In each subsample, a distance matrix is computed using the selected sequences and the seqdist.args arguments and the cluster quality indices are then estimated using as.clustrange.

The clustering can be specified either as a seqclararange object or a data.frame.

References