Learn R Programming

streamMOA (version 1.3-0)

DSC_DenStream: DenStream Data Stream Clusterer

Description

Interface for the DenStream cluster algorithm for data streams implemented in MOA.

Usage

DSC_DenStream(
  epsilon,
  mu = 1,
  beta = 0.2,
  lambda = 0.001,
  initPoints = 100,
  offline = 2,
  processingSpeed = 1,
  recluster = TRUE,
  k = NULL
)

Value

An object of class DSC_DenStream (subclass of DSC, DSC_MOA, DSC_Micro) or, for recluster = TRUE, an object of class DSC_TwoStage.

Arguments

epsilon

defines the epsilon neighbourhood which is the maximal radius of micro-clusters (r<=epsilon). Range: 0 to 1.

mu

minpoints as the weight w a core-micro-clusters needs to be created (w>=mu). Range: 0 to max(int).

beta

multiplier for mu to detect outlier micro-clusters given their weight w (w<beta x mu). Range: 0 to 1

lambda

decay constant.

initPoints

number of points to use for initialization via DBSCAN.

offline

offline multiplier for epsilon. Range: between 2 and 20). Used for reachability reclustering

processingSpeed

Number of incoming points per time unit (important for decay). Range: between 1 and 1000.

recluster

logical; should the offline DBSCAN-based (i.e., reachability at a distance of epsilon) be performed?

k

integer; tries to automatically chooses offline to find k macro-clusters.

Author

Michael Hahsler and John Forrest

Details

DenStream applies reachbility (from DBSCAN) between micro-clusters for reclustering using epsilon x offline (defaults to 2) as the reachability threshold.

If k is specified it automatically chooses the reachability threshold to find k clusters. This is achieved using single-link hierarchical clustering.

References

Cao F, Ester M, Qian W, Zhou A (2006). Density-Based Clustering over an Evolving Data Stream with Noise. In Proceedings of the 2006 SIAM International Conference on Data Mining, pp 326-337. SIAM.

Bifet A, Holmes G, Pfahringer B, Kranen P, Kremer H, Jansen T, Seidl T (2010). MOA: Massive Online Analysis, a Framework for Stream Classification and Clustering. In Journal of Machine Learning Research (JMLR).

See Also

Other DSC_MOA: DSC_BICO_MOA(), DSC_CluStream(), DSC_ClusTree(), DSC_DStream_MOA(), DSC_MCOD(), DSC_MOA(), DSC_StreamKM()

Examples

Run this code
# data with 3 clusters and 5% noise
set.seed(1000)
stream <- DSD_Gaussians(k = 3, d = 2, noise = 0.05)

# use Den-Stream with reachability reclustering
denstream <- DSC_DenStream(epsilon = .05)
update(denstream, stream, 500)
denstream

# plot macro-clusters
plot(denstream, stream, type = "both")

# plot micro-cluster
plot(denstream, stream, type = "micro")

# show micro and macro-clusters
plot(denstream, stream, type = "both")

# reclustering: Choose reclustering reachability threshold automatically to find 4 clusters
denstream2 <- DSC_DenStream(epsilon = .05, k = 4)
update(denstream2, stream, 500)
plot(denstream2, stream, type = "both")

Run the code above in your browser using DataLab