Learn R Programming

largeVis (version 0.2.1.1)

randomProjectionTreeSearch: Find approximate k-Nearest Neighbors using random projection tree search.

Description

A fast and accurate algorithm for finding approximate k-nearest neighbors.

Usage

randomProjectionTreeSearch(x, K = 150, n_trees = 50,
  tree_threshold = max(10, nrow(x)), max_iter = 1,
  distance_method = "Euclidean", seed = NULL, threads = NULL,
  verbose = getOption("verbose", TRUE))

# S3 method for matrix randomProjectionTreeSearch(x, K = 150, n_trees = 50, tree_threshold = max(10, nrow(x)), max_iter = 1, distance_method = "Euclidean", seed = NULL, threads = NULL, verbose = getOption("verbose", TRUE))

# S3 method for CsparseMatrix randomProjectionTreeSearch(x, K = 150, n_trees = 50, tree_threshold = max(10, nrow(x)), max_iter = 1, distance_method = "Euclidean", seed = NULL, threads = NULL, verbose = getOption("verbose", TRUE))

# S3 method for TsparseMatrix randomProjectionTreeSearch(x, K = 150, n_trees = 50, tree_threshold = max(10, nrow(x)), max_iter = 1, distance_method = "Euclidean", seed = NULL, threads = NULL, verbose = getOption("verbose", TRUE))

Arguments

x

A (potentially sparse) matrix, where examples are columnns and features are rows.

K

How many nearest neighbors to seek for each node.

n_trees

The number of trees to build.

tree_threshold

The threshold for creating a new branch. The paper authors suggest using a value equivalent to the number of features in the input set.

max_iter

Number of iterations in the neighborhood exploration phase.

distance_method

One of "Euclidean" or "Cosine."

seed

Random seed passed to the C++ functions. If seed is not NULL (the default), the maximum number of threads will be set to 1 in phases that would be non-determinstic otherwise.

threads

The maximum number of threads to spawn. Determined automatically if NULL (the default).

verbose

Whether to print verbose logging using the progress package.

Value

A [K, N] matrix of the approximate K nearest neighbors for each vertex.

Details

Note that the algorithm does not guarantee that it will find K neighbors for each node. A warning will be issued if it finds fewer neighbors than requested. If the input data contains distinct partitionable clusters, try increasing the tree_threshold to increase the number of returned neighbors.