Learn R Programming

rchemo (version 0.1-3)

getknn: KNN selection

Description

Function getknn selects the \(k\) nearest neighbours of each row observation of a new data set (= query) within a training data set, based on a dissimilarity measure.

getknn uses function get.knnx of package FNN (Beygelzimer et al.) available on CRAN.

Usage

getknn(Xtrain, X, k = NULL, diss = c("eucl", "mahal"), 
  algorithm = "brute", list = TRUE)

Value

A list of outputs, such as:

nn

A dataframe (\(m x k\)) with the indexes of the neighbors.

d

A dataframe (\(m x k\)) with the dissimilarities between the neighbors and the new observations.

listnn

Same as $nn but in a list format.

listd

Same as $d but in a list format.

Arguments

Xtrain

Training X-data (\(n, p\)).

X

New X-data (\(m, p\)) to consider.

k

The number of nearest neighbors to select in Xtrain for each observation of X.

diss

The type of dissimilarity used. Possible values are "eucl" (default; Euclidean distance) or "mahal" (Mahalanobis distance).

algorithm

Search algorithm used for Euclidean and Mahalanobis distances. Default to "brute". See get.knnx.

list

If TRUE (default), a list format is also returned for the outputs.

Examples

Run this code

n <- 10
p <- 4
X <- matrix(rnorm(n * p), ncol = p)
Xtrain <- X
Xtest <- X[c(1, 3), ]
m <- nrow(Xtest)

k <- 3
getknn(Xtrain, Xtest, k = k)

fm <- pcasvd(Xtrain, nlv = 2)
Ttrain <- fm$T
Ttest <- transform(fm, Xtest)
getknn(Ttrain, Ttest, k = k, diss = "mahal")

Run the code above in your browser using DataLab