ms which, for a given
bandwidth, detects the local modes and performs the clustering.These functions implement the techniques presented in Einbeck (2011).
meanshift(X, x, h)
ms.rep(X, x, h, plotms=1, thresh= 0.00000001, iter=100)
ms(X, h, subset, thr=0.001, scaled= TRUE, plotms=2, or.labels=NULL, ...)
ms.self.coverage(X, taumin=0.02, taumax=0.5, gridsize=25,
thr=0.001, scaled=TRUE, cluster=FALSE, plot.type="o",
or.labels=NULL, print=FALSE, ...)1:n. This allows to run the iterative mean shift procedure only
from a subset of points (if unspecified, 1:n is used here,
i.e. each data point serves as a starting point).x) falls below
thresh, or after iter iterations (whatever event
happens first).TRUE, distances are always measured to the
cluster to which an observation is assigned, rather than to the
nearest cluster.gridsize is large.ms:scaled=TRUE).names().Chen (1995) showed that, if the mean shift is computed iteratively, the resulting sequence of local means converges to a mode of the estimated density function. By assigning each data point to the mode to which it has converged, this turns into a clustering technique.
The concepts of coverage and self-coverage, which were originally introduced in the principal curve context, adapt straightforwardly to this setting.
The goodness-of-fit messure Rc can also be applied in this context. For
instance, a value of $R_C=0.8$ means that,
after the clustering, the mean absolute residual length has been
reduced by $80%$ (compared to the distances to the overall mean).
Einbeck, J. (2011). Bandwidth selection for mean-shift based unsupervised learning techniques: a unified approach via self-coverage. Journal of Pattern Recognition Research, to appear.
Rc, lpc.self.coveragedata(faithful)
foo <- ms.self.coverage(faithful,gridsize= 10, taumin=0.1, taumax=0.5,
plot.type="o") # need higher gridsizes in practice!
h <- select.self.coverage(foo)$select
fit <- ms(faithful,h=h[1])
coverage(fit$data, fit$cluster.center)
Rc(fit$data, fit$cluster.center[fit$closest.label,], type="points")Run the code above in your browser using DataLab