mean_shift: Mean Shift Clustering

Description

A fast implementation of mean-shift clustering using dual-tree range search. Given a dataset, this uses the mean shift algorithm to produce and return a clustering of the data.

Usage

mean_shift(
  input,
  force_convergence = FALSE,
  in_place = FALSE,
  labels_only = FALSE,
  max_iterations = NA,
  radius = NA,
  verbose = getOption("mlpack.verbose", FALSE)
)

Value

A list with several components:

centroid: If specified, the centroids of each cluster will be written to the given matrix (numeric matrix).
output: Matrix to write output labels or labeled data to (numeric matrix).

Arguments

input: Input dataset to perform clustering on (numeric matrix).
force_convergence: If specified, the mean shift algorithm will continue running regardless of max_iterations until the clusters converge. Default value "FALSE" (logical).
in_place: If specified, a column containing the learned cluster assignments will be added to the input dataset file. In this case, --output_file is overridden. (Do not use with Python.. Default value "FALSE" (logical).
labels_only: If specified, only the output labels will be written to the file specified by --output_file. Default value "FALSE" (logical).
max_iterations: Maximum number of iterations before mean shift terminates. Default value "1000" (integer).
radius: If the distance between two centroids is less than the given radius, one will be removed. A radius of 0 or less means an estimate will be calculated and used for the radius. Default value "0" (numeric).
verbose: Display informational messages and the full list of parameters and timers at the end of execution. Default value "getOption("mlpack.verbose", FALSE)" (logical).

Author

mlpack developers

Details

This program performs mean shift clustering on the given dataset, storing the learned cluster assignments either as a column of labels in the input dataset or separately.

The input dataset should be specified with the "input" parameter, and the radius used for search can be specified with the "radius" parameter. The maximum number of iterations before algorithm termination is controlled with the "max_iterations" parameter.

The output labels may be saved with the "output" output parameter and the centroids of each cluster may be saved with the "centroid" output parameter.

Examples

Run this code

# For example, to run mean shift clustering on the dataset "data" and store
# the centroids to "centroids", the following command may be used: 

if (FALSE) {
output <- mean_shift(input=data)
centroids <- output$centroid
}

Run the code above in your browser using DataLab