ms
which, for a given
bandwidth, detects the local modes and performs the clustering.These functions implement the techniques presented in Einbeck (2011).
meanshift(X, x, h)
ms.rep(X, x, h, plotms=1, thresh= 0.00000001, iter=100)
ms(X, h, subset, thr=0.001, scaled= TRUE, plotms=2, or.labels=NULL, ...)
ms.self.coverage(X, taumin=0.02, taumax=0.5, gridsize=25,
thr=0.001, scaled=TRUE, cluster=FALSE, plot.type="o",
or.labels=NULL, print=FALSE, ...)
1:n
. This allows to run the iterative mean shift procedure only
from a subset of points (if unspecified, 1:n
is used here,
i.e. each data point serves as a starting point).x
) falls below
thresh
, or after iter
iterations (whatever event
happens first).TRUE
, distances are always measured to the
cluster to which an observation is assigned, rather than to the
nearest cluster.gridsize
is large.ms
:scaled=TRUE
).names()
.Chen (1995) showed that, if the mean shift is computed iteratively, the resulting sequence of local means converges to a mode of the estimated density function. By assigning each data point to the mode to which it has converged, this turns into a clustering technique.
The concepts of coverage and self-coverage, which were originally introduced in the principal curve context, adapt straightforwardly to this setting.
The goodness-of-fit messure Rc
can also be applied in this context. For
instance, a value of $R_C=0.8$ means that,
after the clustering, the mean absolute residual length has been
reduced by $80%$ (compared to the distances to the overall mean).
Einbeck, J. (2011). Bandwidth selection for mean-shift based unsupervised learning techniques: a unified approach via self-coverage. Journal of Pattern Recognition Research, to appear.
Rc
, lpc.self.coverage
data(faithful)
foo <- ms.self.coverage(faithful,gridsize= 10, taumin=0.1, taumax=0.5,
plot.type="o") # need higher gridsizes in practice!
h <- select.self.coverage(foo)$select
fit <- ms(faithful,h=h[1])
coverage(fit$data, fit$cluster.center)
Rc(fit$data, fit$cluster.center[fit$closest.label,], type="points")
Run the code above in your browser using DataLab