meanShift: Mean shift classification

Description

meanShift performs classification of a set of query points using steepest ascent to local maxima in a kernel density estimate.

Usage

meanShift(queryData, trainData, nNeighbors = NROW(trainData),
  algorithm = "LINEAR", bandwidth, alpha = 0, iterations = 10,
  epsilon = 1e-08, epsilonCluster = 1e-04, parameters = NULL)

Arguments

queryData

A matrix or vector of points to be classified by the mean shift algorithm. Values must be finite and non-missing.

trainData

A matrix or vector of points used to form a kernel density estimate. The local maxima from this kernel density estimate will be used for steepest ascent classification.

nNeighbors

A scalar indicating the number neighbors to consider for the kernel density estimate. This is useful to speed up approximation by approximating the kernel density estimate. The default is all data.

algorithm

A string indicating the algorithm to use for nearest neighbor searches. Currently, only "LINEAR" and "KDTREE" methods are supported.

bandwidth

A vector of length equal to the number of columns in the queryData matrix, or length one when queryData is a vector. This value will be used in the kernel density estimate for steepest ascent classification. The default is one for each dimension.

alpha

A scalar tuning parameter for normal kernels. When this parameter is set to zero, the mean shift algorithm will operate as usual. When this parameter is set to one, the mean shift algorithm will be approximated through Newton's Method. When set to a value between zero and one, a generalization of Newton's Method and mean shift will be used instead providing a means to balance convergence speed with stability. The default is zero, mean shift.

iterations

The number of iterations to perform mean shift.

epsilon

A scalar used to determine when to terminate the iteration of a individual query point. If the distance between the query point at iteration i and i+1 is less than epsilon, then iteration ceases on this point.

epsilonCluster

A scalar used to determine the minimum distance between distinct clusters. This distance is applied after all iterations have finished and in order of the rows of queryData.

parameters

A scalar or vector of parameters used by the specific algorithm. There are no optional parameters for the "LINEAR" method, "KDTREE" supports optional parameters for the maximum number of points to store in a leaf node and the maximum value for the quadratic form in the normal kernel, ignoring the constant value -0.5.

Value

A list is returned containing two items: assignment, a vector of classifications. value, a vector or matrix containing the location of the classified local maxima in the support, each row is associated with the classified index in assignment.

References

Cheng, Y. (1995). Mean shift, mode seeking, and clustering. IEEE transactions on pattern analysis and machine intelligence, 17(8), 790-799.

Fukunaga, K., & Hostetler, L. (1975). The estimation of the gradient of a density function, with applications in pattern recognition. IEEE transactions on information theory, 21(1), 32-40.

Lisic, J. (2015). Parcel Level Agricultural Land Cover Prediction (Doctoral dissertation, George Mason University).

Examples

Run this code

# NOT RUN {
x <- matrix(runif(20),10,2)
classification <- meanShift(x,x)

x <- matrix(runif(20),10,2)
classification <- meanShift(x,x, 
algorithm="KDTREE", 
nNeighbor=8, 
parameters=c(5,7.1) )

# }

Run the code above in your browser using DataLab