SL.kernelKnn: SL wrapper for KernelKNN

Description

Wrapper for a configurable implementation of k-nearest neighbors. Supports both binomial and gaussian outcome distributions.

Usage

SL.kernelKnn(Y, X, newX, family, k = 10, method = "euclidean",
  weights_function = NULL, extrema = F, h = 1, ...)

Value

List with predictions and the original training data & hyperparameters.

Arguments

Y: Outcome variable
X: Training dataframe
newX: Test dataframe
family: Gaussian or binomial
k: Number of nearest neighbors to use
method: Distance method, can be 'euclidean' (default), 'manhattan', 'chebyshev', 'canberra', 'braycurtis', 'pearson_correlation', 'simple_matching_coefficient', 'minkowski' (by default the order 'p' of the minkowski parameter equals k), 'hamming', 'mahalanobis', 'jaccard_coefficient', 'Rao_coefficient'
weights_function: Weighting method for combining the nearest neighbors. Can be 'uniform' (default), 'triangular', 'epanechnikov', 'biweight', 'triweight', 'tricube', 'gaussian', 'cosine', 'logistic', 'gaussianSimple', 'silverman', 'inverse', 'exponential'.
extrema: if TRUE then the minimum and maximum values from the k-nearest-neighbors will be removed (can be thought as outlier removal).
h: the bandwidth, applicable if the weights_function is not NULL. Defaults to 1.0.
...: Any additional parameters, not currently passed through.

Examples

Run this code


# Load a test dataset.
data(PimaIndiansDiabetes2, package = "mlbench")

data = PimaIndiansDiabetes2

# Omit observations with missing data.
data = na.omit(data)

Y_bin = as.numeric(data$diabetes)
X = subset(data, select = -diabetes)

set.seed(1)

sl = SuperLearner(Y_bin, X, family = binomial(),
                 SL.library = c("SL.mean", "SL.kernelKnn"))
sl

Run the code above in your browser using DataLab