knn.index.dist: indices and distances of k-nearest-neighbors

Description

This function returns the k nearest indices and distances of each observation

Usage

knn.index.dist(
  data,
  TEST_data = NULL,
  k = 5,
  method = "euclidean",
  transf_categ_cols = F,
  threads = 1,
  p = k
)

Value

a list of length 2. The first sublist returns the indices and the second the distances of the k nearest neighbors for each observation. If TEST_data is NULL the number of rows of each sublist equals the number of rows in the train data. If TEST_data is not NULL the number of rows of each sublist equals the number of rows in the TEST data.

Arguments

data: a data.frame or matrix
TEST_data: a data.frame or matrix (it can be also NULL)
k: an integer specifying the k-nearest-neighbors
method: a string specifying the method. Valid methods are 'euclidean', 'manhattan', 'chebyshev', 'canberra', 'braycurtis', 'pearson_correlation', 'simple_matching_coefficient', 'minkowski' (by default the order 'p' of the minkowski parameter equals k), 'hamming', 'mahalanobis', 'jaccard_coefficient', 'Rao_coefficient'
transf_categ_cols: a boolean (TRUE, FALSE) specifying if the categorical columns should be converted to numeric or to dummy variables
threads: the number of cores to be used in parallel (openmp will be employed)
p: a numeric value specifying the 'minkowski' order, i.e. if 'method' is set to 'minkowski'. This parameter defaults to 'k'

Author

Lampros Mouselimis

Details

This function takes a number of arguments and it returns the indices and distances of the k-nearest-neighbors for each observation. If TEST_data is NULL then the indices-distances for the train data will be returned, whereas if TEST_data is not NULL then the indices-distances for the TEST_data will be returned.

Examples

Run this code


data(Boston)

X = Boston[, -ncol(Boston)]

out = knn.index.dist(X, TEST_data = NULL, k = 4, method = 'euclidean', threads = 1)

Run the code above in your browser using DataLab