Learn R Programming

⚠️There's a newer version (1.2-0) of this package.Take me there.

R package dbscan - Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Related Algorithms

This R package provides a fast C++ (re)implementation of several density-based algorithms with a focus on the DBSCAN family for clustering spatial data. The package includes:

Clustering

  • DBSCAN: Density-based spatial clustering of applications with noise (Ester et al, 1996).
  • HDBSCAN: Hierarchical DBSCAN with simplified hierarchy extraction (Campello et al, 2015).
  • OPTICS/OPTICSXi: Ordering points to identify the clustering structure clustering algorithms (Ankerst et al, 1999).
  • FOSC: Framework for Optimal Selection of Clusters for unsupervised and semisupervised clustering of hierarchical cluster tree (Campello et al, 2013).
  • Jarvis-Patrick clustering: Shared Nearest Neighbor Graph partitioning (Javis and Patrick, 1973).
  • SNN Clustering: Shared Nearest Neighbor Clustering (Erdoz et al, 2003).

Outlier Detection

  • LOF: Local outlier factor algorithm (Breunig et al, 2000).
  • GLOSH: Global-Local Outlier Score from Hierarchies algorithm (Campello et al, 2015).

Fast Nearest-Neighbor Search (using kd-trees)

  • kNN search
  • Fixed-radius NN search

The implementations use the kd-tree data structure (from library ANN) for faster k-nearest neighbor search, and are typically faster than the native R implementations (e.g., dbscan in package fpc), or the implementations in WEKA, ELKI and Python’s scikit-learn.

Installation

Stable CRAN version: Install from within R with

install.packages("dbscan")

Current development version: Install from r-universe.

install.packages("dbscan", repos = "https://mhahsler.r-universe.dev")

Usage

Load the package and use the numeric variables in the iris dataset

library("dbscan")

data("iris")
x <- as.matrix(iris[, 1:4])

DBSCAN

db <- dbscan(x, eps = 0.4, minPts = 4)
db
## DBSCAN clustering for 150 objects.
## Parameters: eps = 0.4, minPts = 4
## Using euclidean distances and borderpoints = TRUE
## The clustering contains 4 cluster(s) and 25 noise points.
## 
##  0  1  2  3  4 
## 25 47 38 36  4 
## 
## Available fields: cluster, eps, minPts, dist, borderPoints

Visualize the resulting clustering (noise points are shown in black).

pairs(x, col = db$cluster + 1L)

OPTICS

opt <- optics(x, eps = 1, minPts = 4)
opt
## OPTICS ordering/clustering for 150 objects.
## Parameters: minPts = 4, eps = 1, eps_cl = NA, xi = NA
## Available fields: order, reachdist, coredist, predecessor, minPts, eps,
##                   eps_cl, xi

Extract DBSCAN-like clustering from OPTICS and create a reachability plot (extracted DBSCAN clusters at eps_cl=.4 are colored)

opt <- extractDBSCAN(opt, eps_cl = 0.4)
plot(opt)

HDBSCAN

hdb <- hdbscan(x, minPts = 4)
hdb
## HDBSCAN clustering for 150 objects.
## Parameters: minPts = 4
## The clustering contains 2 cluster(s) and 0 noise points.
## 
##   1   2 
## 100  50 
## 
## Available fields: cluster, minPts, coredist, cluster_scores,
##                   membership_prob, outlier_scores, hc

Visualize the hierarchical clustering as a simplified tree. HDBSCAN finds 2 stable clusters.

plot(hdb, show_flat = TRUE)

License

The dbscan package is licensed under the GNU General Public License (GPL) Version 3. The OPTICSXi R implementation was directly ported from the ELKI framework’s Java implementation (GNU AGPLv3), with explicit permission granted by the original author, Erich Schubert.

Changes

References

  • Hahsler M, Piekenbrock M, Doran D (2019). dbscan: Fast Density-Based Clustering with R. Journal of Statistical Software, 91(1), 1-30. doi: 10.18637/jss.v091.i01.
  • Martin Ester, Hans-Peter Kriegel, Joerg Sander, Xiaowei Xu (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Institute for Computer Science, University of Munich. Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96), 226-231. https://dl.acm.org/doi/10.5555/3001460.3001507
  • Breunig, M., Kriegel, H., Ng, R., and Sander, J. (2000). LOF: identifying density-based local outliers. In ACM Int. Conf. on Management of Data, pages 93-104. doi: https://doi.org/10.1145/335191.335388
  • Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Joerg Sander (1999). OPTICS: Ordering Points To Identify the Clustering Structure. ACM SIGMOD international conference on Management of data. ACM Press. pp 49-60. doi: https://doi.org/10.1145/304181.304187
  • Campello, Ricardo JGB, Davoud Moulavi, Arthur Zimek, and Joerg Sander (2013). A framework for semi-supervised and unsupervised optimal extraction of clusters from hierarchies. Data Mining and Knowledge Discovery 27(3): 344-371. doi: https://doi.org/10.1007/s10618-013-0311-4
  • Campello RJGB, Moulavi D, Zimek A, Sander J (2015). Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Transactions on Knowledge Discovery from Data (TKDD), 10(5):1-51. doi: https://doi.org/10.1145/2733381
  • R. A. Jarvis and E. A. Patrick. 1973. Clustering Using a Similarity Measure Based on Shared Near Neighbors. IEEE Trans. Comput. 22, 11 (November 1973), 1025-1034. doi: https://doi.org/10.1109/T-C.1973.223640
  • Levent Ertoz, Michael Steinbach, Vipin Kumar, Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data, SIAM International Conference on Data Mining, 2003, 47-59. doi: https://doi.org/10.1137/1.9781611972733.5

Copy Link

Version

Install

install.packages('dbscan')

Monthly Downloads

39,640

Version

1.1-11

License

GPL (>= 2)

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

October 27th, 2022

Functions in dbscan (1.1-11)

dendrogram

Coersions to Dendrogram
dbscan-package

dbscan: Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Related Algorithms
comps

Find Connected Components in a Nearest-neighbor Graph
glosh

Global-Local Outlier Score from Hierarchies
NN

NN --- Nearest Neighbors Superclass
DS3

DS3: Spatial data with arbitrary shapes
frNN

Find the Fixed Radius Nearest Neighbors
extractFOSC

Framework for the Optimal Extraction of Clusters from Hierarchies
dbscan

Density-based Spatial Clustering of Applications with Noise (DBSCAN)
hdbscan

Hierarchical DBSCAN (HDBSCAN)
kNN

Find the k Nearest Neighbors
sNN

Find Shared Nearest Neighbors
pointdensity

Calculate Local Density at Each Data Point
moons

Moons Data
reachability

Reachability Distances
hullplot

Plot Convex Hulls of Clusters
optics

Ordering Points to Identify the Clustering Structure (OPTICS)
lof

Local Outlier Factor Score
kNNdist

Calculate and Plot k-Nearest Neighbor Distances
sNNclust

Shared Nearest Neighbor Clustering
jpclust

Jarvis-Patrick Clustering