Learn R Programming

classiFunc (version 0.1.1)

classiKnn: Create a knn estimator for functional data classification.

Description

Creates an efficient k nearest neighbor estimator for functional data classification. Currently supported distance measures are all metrics implemented in dist and all semimetrics suggested in Fuchs et al. (2015). Additionally, all (semi-)metrics can be used on an arbitrary order of derivation.

Usage

classiKnn(classes, fdata, grid = 1:ncol(fdata), knn = 1L, metric = "L2",
  nderiv = 0L, derived = FALSE, deriv.method = "base.diff",
  custom.metric = function(x, y, ...) {     return(sqrt(sum((x - y)^2))) },
  ...)

Arguments

classes

[factor(nrow(fdata))] factor of length nrow(fdata) containing the classes of the observations.

fdata

[matrix] matrix containing the functional observations as rows.

grid

[numeric(ncol(fdata))] numeric vector of length ncol(fdata) containing the grid on which the functional observations were evaluated.

knn

[integer(1)] number of nearest neighbors to use in the k nearest neighbor algorithm.

metric

[character(1)] character string specifying the (semi-)metric to be used. For a an overview of what is available see the method argument in computeDistMat. For a full list execute metricChoices().

nderiv

[integer(1)] order of derivation on which the metric shall be computed. The default is 0L.

derived

[logical(1)] Is the data given in fdata already derived? Default is set to FALSE, which will lead to numerical derivation if nderiv >= 1L by applying deriv.fd on a Data2fd representation of fdata.

deriv.method

[character(1)] character indicate which method should be used for derivation. Currently implemented are "base.diff", the default, and "fda.deriv.fd". "base.diff" uses the method base::diff for equidistant measures without missing values, which is faster than transforming the data into the class fd and deriving this using fda::deriv.fd. The second variant implies smoothing, which can be preferable for calculating high order derivatives.

custom.metric

[function(x, y, ...)] only used if deriv.method = "custom.method". A function of functional observations x and y returning their distance. The default is the L2 distance. See how to implement your distance function in dist.

...

further arguments to and from other methods. Hand over additional arguments to computeDistMat, usually additional arguments for the specified (semi-)metric. Also, if deriv.method == "fda.deriv.fd" or fdata is not observed on a regular grid, additional arguments to fdataTransform can be specified which will be passed on to Data2fd.

Value

classiKnn returns an object of class "classiKnn". This is a list containing at least the following components:

call

the original function call.

classes

a factor of length nrow(fdata) coding the response of the training data set.

fdata

the raw functional data as a matrix with the individual observations as rows.

grid

numeric vector containing the grid on which fdata is observed)

proc.fdata

the preprocessed data (missing values interpolated, derived and evenly spaced). This data is this.fdataTransform(fdata). See this.fdataTransform for more details.

knn

integer coding the number of nearest neighbors used in the k nearest neighbor classification algorithm.

metric

character string coding the distance metric to be used in computeDistMat.

nderiv

integer giving the order of derivation that is applied to fdata before computing the distances between the observations.

this.fdataTransform

preprocessing function taking new data as a matrix. It is used to transform fdata into proc.fdata and is required to preprocess new data in order to predict it. This function ensures, that preprocessing (derivation, respacing and interpolation of missing values) is done in the exact same way for the original training data set and future (test) data sets.

References

Fuchs, K., J. Gertheiss, and G. Tutz (2015): Nearest neighbor ensembles for functional data with interpretable feature selection. Chemometrics and Intelligent Laboratory Systems 146, 186 - 197.

See Also

predict.classiKnn

Examples

Run this code
# NOT RUN {
# Classification of the Phoneme data
data(Phoneme)
classes = Phoneme[,"target"]

set.seed(123)
# Use 80% of data as training set and 20% as test set
train_inds = sample(1:nrow(Phoneme), size = 0.8 * nrow(Phoneme), replace = FALSE)
test_inds = (1:nrow(Phoneme))[!(1:nrow(Phoneme)) %in% train_inds]

# create functional data as matrix with observations as rows
fdata = Phoneme[,!colnames(Phoneme) == "target"]

# create k = 3 nearest neighbors classifier with L2 distance (default) of the
# first order derivative of the data
mod = classiKnn(classes = classes[train_inds], fdata = fdata[train_inds,],
                 nderiv = 1L, knn = 3L)

# predict the model for the test set
pred = predict(mod, newdata =  fdata[test_inds,], predict.type = "prob")

# }
# NOT RUN {
# Parallelize across 2 CPU's
library(parallelMap)
parallelStartSocket(cpus = 2L) # parallelStartMulticore(cpus = 2L) for Linux
predict(mod, newdata =  fdata[test_inds,], predict.type = "prob", parallel = TRUE, batches = 2L)
parallelStop()
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab