
Implemenation of the hdbscan algorithm.
hdbscan(edges, neighbors = NULL, minPts = 20, K = 5, threads = NULL,
verbose = getOption("verbose", TRUE))
An edge matrix of the type returned by buildEdgeMatrix
or, alternatively, a largeVis
object.
An adjacency matrix of the type returned by randomProjectionTreeSearch
. Must be specified unless
edges
is a largeVis
object.
The minimum number of points in a cluster.
The number of points in the core neighborhood. (See details.)
Maximum number of threads. Determined automatically if NULL
(the default). It is unlikely that
this parameter should ever need to be adjusted. It is only available to make it possible to abide by the CRAN limitation that no package
use more than two cores.
Verbosity.
An object of type hdbscan
with the following fields:
A vector of the cluster membership for each vertex. Outliers
are given NA
A vector of the degree of each vertex' membership. This
is calculated by standardizing each vertex'
A vector of GLOSH outlier scores for each node assigned to a cluster. NA for nodes not in a cluster.
The minimum spanning tree used to generate the clustering.
A representation of the condensed cluster hierarchy.
The call.
The hierarchy describes the complete post-condensation structure of the tree:
The cluster ID of the vertex's immediate parent, after condensation.
The cluster ID of each cluster's parent.
The cluster's stability, taking into account child-node stabilities.
Whether the cluster was selected.
The core distance determined for each vertex.
The hyperparameter K
controls the size of core neighborhoods.
The algorithm measures the density around a point as 1 / the distance between
that point and its K
is similar
to clustering nearest neighbors rather than based on density. A high value of
K
may cause the algorithm to miss some (usually contrived) clustering
patterns, such as where clusters are made up of points arranged in lines to form
shapes.
The function must be provided sufficient nearest-neighbor data for whatever
is specified for largeVis
, which is ordinarily run with a far higher
R. Campello, D. Moulavi, and J. Sander, Density-Based Clustering Based on Hierarchical Density Estimates In: Advances in Knowledge Discovery and Data Mining, Springer, pp 160-172. 2013
# NOT RUN {
library(largeVis)
library(clusteringdatasets) # See https://github.com/elbamos/clusteringdatasets
data(spiral)
dat <- as.matrix(spiral[, 1:2])
neighbors <- randomProjectionTreeSearch(t(dat), K = 10, tree_threshold = 100,
max_iter = 5, threads = 1)
edges <- buildEdgeMatrix(t(dat), neighbors)
clusters <- hdbscan(edges, neighbors = neighbors, verbose = FALSE, threads = 1)
# Calling largeVis while setting sgd_batches to 1 is
# the simplest way to generate the data structures neeeded for hdbscan
spiralVis <- largeVis(t(dat), K = 10, tree_threshold = 100, max_iter = 5,
sgd_batches = 1, threads = 1)
clusters <- hdbscan(spiralVis, verbose = FALSE, threads = 1)
# The gplot function helps to visualize the clustering
largeHighDimensionalDataset <- matrix(rnorm(50000), ncol = 50)
vis <- largeVis(largeHighDimensionalDataset)
clustering <- hdbscan(vis)
gplot(clustering, t(vis$coords))
# }
Run the code above in your browser using DataLab