Learn R Programming

dbscan (version 0.9-8)

optics: OPTICS

Description

Implementation of the OPTICS (Ordering points to identify the clustering structure) clustering algorithm using a kd-tree.

Usage

optics(x, eps, minPts = 5, eps_cl, xi, search = "kdtree", bucketSize = 10, splitRule = "suggest", approx = 0)
optics_cut(x, eps_cl)
opticsXi(object, xi = 0.001, minimum = F, nocorrect = F)
"predict"(object, data, newdata = NULL, ...)

Arguments

x
a data matrix or a distance matrix.
eps
upper limit of the size of the epsilon neighborhood.
minPts
number of minimum points in the eps region (for core points). Default is 5 points.
eps_cl
Threshold to identify clusters (eps_cl
xi
Steepness threshold to identify clusters hierarchically using the Xi method.
search
nearest neighbor search strategy (one of "kdtree", "linear" search, or precomputed "dist") Using precomputed distances is better for high-dimensional, but smaller data sets.
bucketSize
max size of the kd-tree leafs.
splitRule
rule to split the kd-tree. One of "STD", "MIDPT", "FAIR", "SL_MIDPT", "SL_FAIR" or "SUGGEST" (SL stands for sliding). "SUGGEST" uses ANNs best guess.
approx
relative error bound for approximate nearest neighbor searching.
object
an OPTICS object that contains a clustering.
data
the data set used to create the OPTICS clustering object.
newdata
new data set for which cluster membership should be predicted.
minimum
boolean representing whether or not to extract the minimal clusters in the Xi clustering algorithm.
nocorrect
boolean representing whether or not to prune the steep up area for points that have predecessors not in the cluster--found by the ELKI framework, see details below.
...
additional arguments are currently unused.

Value

An object of class 'optics' with components:If eps_cl was specified or optics_cut was called, then in addition the following components are available:If xi was specified or opticsXi was called, then in addition the following components are available:

Details

This implementation of OPTICS implements the original algorithm as described by Ankers et al (1999). OPTICS is similar to DBSCAN, however, for OPTICS eps is only an upper limit for the neighborhood size used to reduce computational complexity. Similar to DBSCAN, minPts is often set to be dimensionality of the data plus one.

OPTICS linearly orders the data points such that points which are spatially closest become neighbors in the ordering. The closest analog to this ordering is dendrogram in single-link hierarchical clustering. The algorithm also calculates the reachability distance for each point. plot() produces a reachability-plot which shows each points reachability distance where the points are sorted by OPTICS. Valleys represent clusters (the deeper the valley, the more dense the cluster) and high points indicate points between clusters.

If eps_cl is specified, then an algorithm to extract clusters (see Ankers et al, 1999) is used. That is, it internally calls optics_cut to extract the clustering. The resulting clustering is similar to what DBSCAN would produce. The only difference is that OPTICS is not able to assign some border points and reports them instead as noise.

If xi is specified, then the algorithm to extract clusters hiearchically specified in Ankers et al (1999), with an added set of fixes to the algorithm originally contributed by the ELKI framework, is used. Internally, optics calls opticsXi to extract the resulting hierarchical representation of what several DBSCANs would produce at varying density thresholds. It has been noted that one interpretation of the xi parameter is that it classifies clusters by change in relative cluster density.

See kNN for more information on the other parameters related to nearest neighbor search.

References

Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Joerg Sander (1999). OPTICS: Ordering Points To Identify the Clustering Structure. ACM SIGMOD international conference on Management of data. ACM Press. pp. 49--60.

See Also

dbscan in fpc.

Examples

Run this code
set.seed(2)
n <- 400

x <- cbind(
  x = runif(4, 0, 1) + rnorm(n, sd=0.1),
  y = runif(4, 0, 1) + rnorm(n, sd=0.1)
  )

plot(x, col=rep(1:4, time = 100))

### run OPTICS
res <- optics(x, eps = 10,  minPts = 10)
res

### get order
res$order

### plot produces a reachability plot
plot(res)

### identify clusters by cutting the reachability plot (black is noise)
res <- optics_cut(res, eps_cl = .065)
res

plot(res)
plot(x, col = res$cluster+1L)

### re-cutting at a higher eps threshold
res <- optics_cut(res, eps_cl = .1)
res
plot(res)
plot(x, col = res$cluster+1L)

### identify clusters of varying density hierarchically using the Xi method
res <- opticsXi(res, xi = 0.05)
res

plot(res)
plot(x, col = res$cluster+1L)
# better visualization of the nested structure using convex hulls
hullplot(x, res)

# Xi cluster structure
res$clusters_xi

### use OPTICS on a precomputed distance matrix
d <- dist(x)
res <- optics(x, eps = 1, minPts = 10)
plot(res)

Run the code above in your browser using DataLab