Learn R Programming

⚠️There's a newer version (1.2-0) of this package.Take me there.

dbscan - Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms - R package

This R package provides a fast C++ (re)implementation of several density-based algorithms with a focus on the DBSCAN family for clustering spatial data. The package includes:

Clustering

  • DBSCAN: Density-based spatial clustering of applications with noise.
  • HDBSCAN: Hierarchical DBSCAN with simplified hierarchy extraction.
  • OPTICS/OPTICSXi: Ordering points to identify the clustering structure clustering algorithms.
  • FOSC: Framework for Optimal Selection of Clusters for unsupervised and semisupervised clustering of hierarchical cluster tree.
  • Jarvis-Patrick clustering
  • SNN Clustering: Shared Nearest Neighbor Clustering.

Outlier Detection

  • LOF: Local outlier factor algorithm.
  • GLOSH: Global-Local Outlier Score from Hierarchies algorithm.

Fast Nearest-Neighbor Search (using kd-trees)

  • kNN search
  • Fixed-radius NN search

The implementations use the kd-tree data structure (from library ANN) for faster k-nearest neighbor search, and are typically faster than the native R implementations (e.g., dbscan in package fpc), or the implementations in WEKA, ELKI and Python's scikit-learn.

Installation

Stable CRAN version: install from within R with

install.packages("dbscan")

Current development version: Download package from AppVeyor or install from GitHub (needs devtools).

library("devtools")
install_github("mhahsler/dbscan")

Usage

Load the package and use the numeric variables in the iris dataset

library("dbscan")

data("iris")
x <- as.matrix(iris[, 1:4])

Run DBSCAN

db <- dbscan(x, eps = .4, minPts = 4)
db
DBSCAN clustering for 150 objects.
Parameters: eps = 0.4, minPts = 4
The clustering contains 4 cluster(s) and 25 noise points.

 0  1  2  3  4 
25 47 38 36  4 

Available fields: cluster, eps, minPts

Visualize results (noise is shown in black)

pairs(x, col = db$cluster + 1L)

Calculate LOF (local outlier factor) and visualize (larger bubbles in the visualization have a larger LOF)

lof <- lof(x, k = 4)
pairs(x, cex = lof)

Run OPTICS

opt <- optics(x, eps = 1, minPts = 4)
opt
OPTICS clustering for 150 objects.
Parameters: minPts = 4, eps = 1, eps_cl = NA, xi = NA
Available fields: order, reachdist, coredist, predecessor, minPts, eps, eps_cl, xi

Extract DBSCAN-like clustering from OPTICS and create a reachability plot (extracted DBSCAN clusters at eps_cl=.4 are colored)

opt <- extractDBSCAN(opt, eps_cl = .4)
plot(opt)

Extract a hierarchical clustering using the Xi method (captures clusters of varying density)

opt <- extractXi(opt, xi = .05)
opt
plot(opt)

Run HDBSCAN (captures stable clusters)

hdb <- hdbscan(x, minPts = 4)
hdb
HDBSCAN clustering for 150 objects.
Parameters: minPts = 4
The clustering contains 2 cluster(s) and 0 noise points.

  1   2 
100  50 

Available fields: cluster, minPts, cluster_scores, membership_prob, outlier_scores, hc

Visualize the results as a simplified tree

plot(hdb, show_flat = T)

See how well each point corresponds to the clusters found by the model used

  colors <- mapply(function(col, i) adjustcolor(col, alpha.f = hdb$membership_prob[i]), 
                   palette()[hdb$cluster+1], seq_along(hdb$cluster))
  plot(x, col=colors, pch=20)

License

The dbscan package is licensed under the GNU General Public License (GPL) Version 3. The OPTICSXi R implementation was directly ported from the ELKI framework's Java implementation (GNU AGPLv3), with explicit permission granted by the original author, Erich Schubert.

Further Information

Copy Link

Version

Install

install.packages('dbscan')

Monthly Downloads

39,640

Version

1.1-8

License

GPL (>= 2)

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

April 27th, 2021

Functions in dbscan (1.1-8)

glosh

Global-Local Outlier Score from Hierarchies
NN

Nearest Neighbors Auxiliary Functions
frNN

Find the Fixed Radius Nearest Neighbors
DS3

DS3: Spatial data with arbitrary shapes
hdbscan

HDBSCAN
extractFOSC

Framework for Optimal Selection of Clusters
hullplot

Plot Convex Hulls of Clusters
kNN

Find the k Nearest Neighbors
jpclust

Jarvis-Patrick Clustering
dbscan

DBSCAN
moons

Moons Data
optics

OPTICS
lof

Local Outlier Factor Score
kNNdist

Calculate and Plot k-Nearest Neighbor Distances
pointdensity

Calculate Local Density at Each Data Point
reachability

Density Reachability Structures
sNN

Shared Nearest Neighbors
sNNclust

Shared Nearest Neighbor Clustering