This function provides bootstrapping for hierarchical clustering
(hclust
objects). Internally, it uses Hcl2mat() which
converts 'hclust' objects into binary matrix of cluster memberships.
The default clustering method is the variance-minimizing "ward.D"
(which works better with Euclidean distances); to make it coherent with
hclust() default, specify 'method.c="complete"'. Also, it sometimes
makes sense to transform non-Euclidean distances into Euclidean with
'dist(_non_euclidean_dist_)'.
Bclust() and companion functions were based on functions from the
'bootstrap' package of Sebastian Gibb.
Option 'hclist' presents the special case when list of 'hclust' objects
is pre-build. In that case, other arguments (except 'mc.cores' and
'monitor') will be ignored, and the first component of 'hclist', that
is 'hclist[[1]]', will be used as "original" clustering to compare with
all other objects in the 'hclist'. Number of replicates is the length
of 'hclist' minus one.
Option 'relative' changes the mechanism of how branches of reference
clustering ("original") and bootstrapped clustering ("current")
compared. If 'relative=FALSE' (default), only absolute matches (present
or absent) are count, and vector of matches is binary (either 0 or 1).
If 'relative=TRUE', branches of "original" which have no matches in
"current", are checked additionally for the similarity with all
branches of "current", and the minimal (asymmetric) binary
dissimilarity value is used as a match. Therefore, the matching vector
in this case is numeric instead of binary. This will typically result
in the reliable raising of bootstrap values. The underlying methodology
is similar to what is defined in Lemoine et al. (2018) as a "transfer
bootstrap". As the asymmetric binary is the _proportion_ of items in
which only one is "1" amongst those which have one or two "1", it is
possible to rephrase Lemoine et al. (2018), and say that this distance
is equal to the _proportion_ of items that must be _removed_ to make
both branches identical. Please note that with 'relative=TRUE', the
whole algorithm is several times slower then default.
Please note that Bclust() frequently underestimates the cluster
stability when number of characters is relatively small. One of
possible remedies is to use hyper-binding (like "cbind(data, data,
data)") to reach the reliable number of characters.
plot.Bclust() designed for quick plotting and plots labels (bootstrap
support values) with the following defaults: 'percent=TRUE, pos=3,
offset=0.1'. To change how labels are plotted, use separate Bclabels()
command.