kms: Kernel mean shift clustering.

Description

Kernel mean shift clustering for 2- to 6-dimensional data.

Usage

kms(x, y, H, max.iter=400, tol.iter, tol.clust, min.clust.size, merge=TRUE,
    keep.path=FALSE, verbose=FALSE)
# S3 method for kms
summary(object, ...)

Arguments

matrix of data values

matrix of candidate data values for which the mean shift will estimate their cluster labels. If missing, y=x.

bandwidth matrix/scalar bandwidth. If missing, Hpi(x,deriv,order=1) is called by default.

max.iter

maximum number of iterations. Default is 400.

tol.iter

distance under which two successive iterations are considered convergent. Default is 0.001*min marginal IQR of x.

tol.clust

distance under which two cluster modes are considered to form one cluster. Default is 0.01*max marginal IQR of x.

min.clust.size

minimum cluster size (cardinality). Default is 0.01*nrow(y).

merge

flag to merge clusters which are smaller than min.clust.size. Default is TRUE.

keep.path

flag to store the density gradient ascent paths. Default is FALSE.

verbose

flag to print out progress information. Default is FALSE.

object

object of class kms

...

other parameters

Value

A kernel mean shift clusters set is an object of class kms which is a list with fields:

x,y

data points - same as input

end.points

matrix of final iterates starting from y

bandwidth matrix

label

vector of cluster labels

nclust

number of clusters

nclust.table

frequency table of cluster labels

mode

matrix of cluster modes

names

variable names

tol.iter,tol.clust,min.clust.size

tuning parameter values - same as input

path

list of density gradient ascent paths where path[[i]] is the path of y[i,] (only if keep.path=TRUE)

Details

Mean shift clustering belongs to the class of modal or density-based clustering methods. The mean shift recurrence of the candidate point \({\bold x}\) is \({\bold x}_{j+1} = {\bold x}_j + \bold{{\rm H}} {\sf D} \hat{f}({\bold x}_j)/\hat{f}({\bold x}_j)\) where \(j\geq 0\) and \({\bold x}_0 = {\bold x}\). The sequence \(\{{\bold x}_0, {\bold x}_1, \dots \}\) follows the density gradient ascent paths to converge to a local mode of the density estimate \(\hat{f}\). Hence \({\bold x}\) is iterated until it converges to its local mode, and this determines its cluster label.

The mean shift recurrence is terminated if successive iterations are less than tol.iter or the maximum number of iterations max.iter is reached. Final iterates which are less than tol.clust distance apart are considered to form a single cluster. If merge=TRUE then the clusters whose cardinality is less than min.clust.size are iteratively merged with their nearest cluster.

If the bandwidth H is missing, then the default bandwidth is the plug-in selector for the density gradient Hpi(x,deriv.order=1). Any bandwidth that is suitable for the density gradient is also suitable for the mean shift.

References

Chacon, J.E. & Duong, T. (2013) Data-driven density estimation, with applications to nonparametric clustering and bump hunting. Electronic Journal of Statistics , 7, 499-532.

Comaniciu, D. & Meer, P. (2002). Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 603-619.

Examples

Run this code

# NOT RUN {
library(MASS)
data(crabs)
kms.crabs <- kms(x=crabs[,c("FL", "CW")])
plot(kms.crabs$x, col=kms.crabs$label)
summary(kms.crabs)
# }

Run the code above in your browser using DataLab