bootclustrange
estimates the quality of the clustering based on subsamples of the data to avoid computational overload.
bootclustrange(object, seqdata, seqdist.args = list(method = "LCS"),
R = 100, sample.size = 1000, parallel = FALSE,
progressbar = FALSE, sampling = "clustering",
strata = NULL)
# S3 method for bootclustrange
plot(x, stat = "noCH", legendpos = "bottomright",
norm = "none", withlegend = TRUE, lwd = 1,
col = NULL, ylab = "Indicators",
xlab = "N clusters", conf.int = 0.95,
ci.method = "perc", ci.alpha = 0.3,
line = "median", ...)
# S3 method for bootclustrange
print(x, digits = 2, bootstat = c("mean"), ...)
A clustrange
object, see as.clustrange
with the bootrapped values.
A seqclararange
object
or a data.frame
with the clustering to be evaluated.
State sequence object of class stslist
. The sequence data to use. Use seqdef
to create such an object.
List of arguments passed to seqdist
for computing the distances.
Numeric. The number of subsamples to use.
Numeric. The size of the subsamples, values between 1000 and 10 000 are recommended.
Logical. Whether to initialize the parallel processing of the future
package using the default multisession
strategy. If FALSE
(default), then the current plan
is used. If TRUE
, multisession
plan
is initialized using default values.
Logical. Whether to initialize a progressbar using the future
package. If FALSE
(default), then the current progress bar handlers
is used . If TRUE
, a new global progress bar handlers
is initialized.
Character. The sampling procedure to be used: "clustering"
(default) the sampling is stratified by the maximum number of clusters, use "medoids"
to add the medoids in each subsamples, "strata"
to stratify by the strata
arguments, or "random"
for random sampling.
An optional stratification variable.
A bootclustrange
object to be plotted or printed.
Character. The list of statistics to plot or "noCH" to plot all statistics except "CH" and "CHsq" or "all" for all statistics. See as.clustrange
for a list of possible values.
Character. legend position, see legend
.
Character. Normalization method of the statistics can be one of "none" (no normalization), "range" (given as (value -min)/(max-min), "zscore" (adjusted by mean and standard deviation) or "zscoremed" (adjusted by median and median of the difference to the median).
Logical. If FALSE
, the legend is not plotted.
Numeric. Line width, see par
.
A vector of line colors, see par
. If NULL
, a default set of color is used.
x axis label.
y axis label.
Confidence to build the confidence interval (default: 0.95).
Method used to build the confidence interval (only if bootstrap has been used, see R above). One of "none" (do not plot confidence interval), "norm" (based on normal approximation), "perc" (default, based on percentile).)
alpha color value used to plot the interval.
Which value should be plotted by the line? One of "mean" (average over all bootstraps), "median"(default, median over all bootstraps).
Number of digits to be printed.
The summary statistic to use "mean"
or "median"
.
Additionnal parameters passed to/from methods.
bootclustrange
estimates the quality of the clustering based on subsamples of the data to avoid computational overload. It randomly samples R
times sample.size
sequences from seqdata
using the sampling procedure defined by the sampling
arguments. In each subsample, a distance matrix is computed using the selected sequences and the seqdist.args
arguments and the cluster quality indices are then estimated using as.clustrange
.
The clustering can be specified either as a seqclararange
object or a data.frame
.
Studer, M., R. Sadeghi and L. Tochon (2024). Sequence Analysis for Large Databases. LIVES Working Papers 104 tools:::Rd_expr_doi("10.12682/lives.2296-1658.2024.104")
See Also as.clustrange
for the list of cluster quality indices that are computed, and seqclararange
for example of use