findNeighbors: Nearest neighbor search in a multidimensional space

Description

Finds a user specified number of nearest neighbors of a multivariate space defined by the coordinates of the input matrix. Alternatively, the user can specify a maximum distance over which to search for nearest neighbors.

Usage

findNeighbors(x, n.neighbor=NULL, metric=1, max.distance = 0.,
    olag=0, sort.distances=TRUE)

Arguments

an embedding matrix. Each column of the matrix respresents a single coordinate of the embedding and each row denotes the coordinates of a single point in the embedding.

max.distance

used an alternative to n.neighbor, use this parameter to specify the maximum distance to search relative to the current point in the phase space. The metric for the distance is specified separately by the optional metric input argument. This arguments must be positive and will only be used if n.neighbor is NULL, equal to zero, or less than zero. Default: 0.0.

metric

the metric used to define the distance between points in the embedding. Choices are limited to 1, 2, or Inf which represent an \(L_1\), \(L_2\), and \(L_\infty\) norm, respectively. Default: 1.

n.neighbor

the number of neighbors to find for each point in the embedding. If not NULL, this argument overrides the max.distance argument. Default: 2.

olag

an integer scalar representing the orbital lag, which defines the number of points along a trajectory (both forwards and backwards) that are to be excluded as nearest neighbor candidates. This argument helps to prevent temporally correlated data samples from being considered as neighbors to a given point in the embedding. This sitatuation can arise, for example, when a smooth trajectory has been highly oversampled in time. An orbital lag of 0 implies that the reference point itself may be considered a neighbor candidate. To exclude self-neighbors, set olag greater than zero. Default: 0.

sort.distances

a logical flag. If TRUE, the neighbors for a given point are sorted by distance from closest to farthest. Default: TRUE.

Value

a list containing the indices of the original points (corresponding to rows of the embedding matrix), the indices of the neighbors found, and the distance between them. The distance metric is based on that specified by the optional metric input argument.

Details

An efficient recursive algorithm is used to find all nearest neighbors. First a quadtree is developed to form a recursive partitioning of the embedding matrix, returning row and column index vectors and a list of medians which may be used to sort the embedding matrix. The quadtree is then traversed as an efficient means to find nearest neighbors.

References

Friedman, J., Bentley, J. L., and Finkel, R. A., ``An algorithm for finding best matches in logarithmic expected time", ACM Transactions on Mathematical Software 3, 209--226, 1977.

Examples

Run this code

# NOT RUN {
## Calculate the 10 nearest neighbors for each 
## point of 3-dimensional delayed coordinate 
## embedding of the beamchaos data. Exclude 
## self-neighbors from the output. 
embedding <- embedSeries( beamchaos, dim = 3, tlag = 10 )
nn <- findNeighbors( embedding, n.neighbor=10, olag=1 )

## Using the same data, find only those neighbors 
## within a distance 0.1 of the original points 
## based on an L-infinity metric 
nn.dist <- findNeighbors( embedding, max.distance=0.1,
metric=Inf, olag=1 )
# }

Run the code above in your browser using DataLab