Learn R Programming

VIM (version 3.0.2)

kNN: k-Nearest Neighbour Imputation

Description

k-Nearest Neighbour Imputation based on a variation of the Gower Distance for numerical, categorical, ordered and semi-continous variables.

Usage

kNN(data, variable = colnames(data), metric = NULL, k = 5,
    dist_var = colnames(data), weights = NULL, numFun = median,
    catFun = maxCat, makeNA = NULL, NAcond = NULL, impNA = TRUE,
    donorcond = NULL, mixed = vector(), trace = FALSE, imp_var = TRUE,
    imp_suffix = "imp", addRandom = FALSE)
sampleCat(x)
maxCat(x)
gowerD(data.x, data.y = data.x, weights = NULL, numerical, factors,
    orders, mixed, levOrders)
which.minN(x, n)

Arguments

data
data.frame or matrix
variable
variables where missing values should be imputed
metric
metric to be used for calculating the distances between
k
number of Nearest Neighbours used
dist_var
names or variables to be used for distance calculation
weights
weights for the variables for distance calculation
numFun
function for aggregating the k Nearest Neighbours in the case of a numerical variable
catFun
function for aggregating the k Nearest Neighbours in the case of a categorical variable
makeNA
vector of values, that should be converted to NA
NAcond
a condition for imputing a NA
impNA
TRUE/FALSE whether NA should be imputed
donorcond
condition for the donors e.g. ">5"
trace
TRUE/FALSE if additional information about the imputation process should be printed
imp_var
TRUE/FALSE if a TRUE/FALSE variables for each imputed variable should be created show the imputation status
imp_suffix
suffix for the TRUE/FALSE variables showing the imputation status
addRandom
TRUE/FALSE if an additional random variable should be added for distance calculation
x
factor or character vector / numerical vector for which.minN
data.x
data frame or matrix
data.y
data frame or matrix
numerical
names of numerical variables
factors
names of factors
orders
names of ordered variables
mixed
names of mixed variables
levOrders
list of the ordered levels for each factor
n
number of ordered smallest values

Value

  • the imputed data set.

Details

The function sampleCat samples with probabilites corresponding to the occurrence of the level in the NNs. The function maxCat chooses the level with the most occurrences and random if the maximum is not unique. The function gowerD is used by kNN to compute the distances for numerical, factor ordered and semi-continous variables. The function which.minN is used by kNN.

Examples

Run this code
data(sleep)
kNN(sleep)

Run the code above in your browser using DataLab