kNN: k-Nearest Neighbour Imputation

Description

k-Nearest Neighbour Imputation based on a variation of the Gower Distance for numerical, categorical, ordered and semi-continous variables.

Usage

kNN(data, variable = colnames(data), metric = NULL, k = 5,
    dist_var = colnames(data), weights = NULL, numFun = median,
    catFun = maxCat, makeNA = NULL, NAcond = NULL, impNA = TRUE,
    donorcond = NULL, mixed = vector(), trace = FALSE, imp_var = TRUE,
    imp_suffix = "imp", addRandom = FALSE)
sampleCat(x)
maxCat(x)
gowerD(data.x, data.y = data.x, weights = NULL, numerical, factors,
    orders, mixed, levOrders)
which.minN(x, n)

Arguments

data

data.frame or matrix

variable

variables where missing values should be imputed

metric

metric to be used for calculating the distances between

number of Nearest Neighbours used

dist_var

names or variables to be used for distance calculation

weights

weights for the variables for distance calculation

numFun

function for aggregating the k Nearest Neighbours in the case of a numerical variable

catFun

function for aggregating the k Nearest Neighbours in the case of a categorical variable

makeNA

vector of values, that should be converted to NA

NAcond

a condition for imputing a NA

impNA

TRUE/FALSE whether NA should be imputed

donorcond

condition for the donors e.g. ">5"

trace

TRUE/FALSE if additional information about the imputation process should be printed

imp_var

TRUE/FALSE if a TRUE/FALSE variables for each imputed variable should be created show the imputation status

imp_suffix

suffix for the TRUE/FALSE variables showing the imputation status

addRandom

TRUE/FALSE if an additional random variable should be added for distance calculation

factor or character vector / numerical vector for which.minN

data.x

data frame or matrix

data.y

data frame or matrix

numerical

names of numerical variables

factors

names of factors

orders

names of ordered variables

mixed

names of mixed variables

levOrders

list of the ordered levels for each factor

number of ordered smallest values

Value

the imputed data set.

Details

The function sampleCat samples with probabilites corresponding to the occurrence of the level in the NNs. The function maxCat chooses the level with the most occurrences and random if the maximum is not unique. The function gowerD is used by kNN to compute the distances for numerical, factor ordered and semi-continous variables. The function which.minN is used by kNN.

Examples

Run this code

data(sleep)
kNN(sleep)

Run the code above in your browser using DataLab