classif.DD: DD-Classifier Based on DD-plot

Description

Fits Nonparametric Classification Procedure Based on DD--plot (depth-versus-depth plot) for G dimensions (\(G=g\times h\), g levels and p data depth).

Usage

classif.DD(
  group,
  fdataobj,
  depth = "FM",
  classif = "glm",
  w,
  par.classif = list(),
  par.depth = list(),
  control = list(verbose = FALSE, draw = TRUE, col = NULL, alpha = 0.25)
)

Value

group.est Estimated vector groups by classified method selected.
misclassification Probability of misclassification.
prob.classification Probability of correct classification by group level.
dep Data frame with the depth of the curves for functional data (or points for multivariate data) in fdataobj w.r.t. each group level.
depth Character vector specifying the type of depth functions used.
par.depth List of parameters for depth function.
classif Type of classifier used.
par.classif List of parameters for classif procedure.
w Optional case weights.
fit Fitted object by classif method using the depth as covariate.

Arguments

group

Factor of length n with g levels.

fdataobj

data.frame, fdata or list with the multivariate, functional or both covariates respectively.

depth

Character vector specifying the type of depth functions to use, see Details.

classif

Character vector specifying the type of classifier method to use, see Details.

w

Optional case weights, weights for each value of depth argument, see Details.

par.classif

List of parameters for classif procedure.

par.depth

List of parameters for depth function.

control

List of parameters for controlling the process.

If verbose=TRUE, report extra information on progress.

If draw=TRUE print DD-plot of two samples based on data depth.

col, the colors for points in DD--plot.

alpha, the alpha transparency used in the background of DD--plot, a number in [0,1].

Author

This version was created by Manuel Oviedo de la Fuente and Manuel Febrero Bande and includes the original version for polynomial classifier created by Jun Li, Juan A. Cuesta-Albertos and Regina Y. Liu.

Details

Make the group classification of a training dataset using DD-classifier estimation in the following steps.

The function computes the selected depth measure of the points in fdataobj w.r.t. a subsample of each g level group and p data dimension (\(G=g \times p\)). The user can be specify the parameters for depth function in par.depth.

(i) Type of depth function from functional data, see Depth:
- "FM": Fraiman and Muniz depth.
- "mode": h--modal depth.
- "RT": random Tukey depth.
- "RP": random project depth.
- "RPD": double random project depth.
(ii) Type of depth function from multivariate functional data, see depth.mfdata:
- "FMp": Fraiman and Muniz depth with common support. Suppose that all p--fdata objects have the same support (same rangevals), see depth.FMp.
- "modep": h--modal depth using a p--dimensional metric, see depth.modep.
- "RPp": random project depth using a p--variate depth with the projections, see depth.RPp.
If the procedure requires to compute a distance such as in "knn" or "np" classifier or "mode" depth, the user must use a proper distance function: metric.lp for functional data and metric.dist for multivariate data.

(iii) Type of depth function from multivariate data, see Depth.Multivariate:
- "SD": Simplicial depth (for bivariate data).
- "HS": Half-space depth.
- "MhD": Mahalanobis depth.
- "RD": random projections depth.
- "LD": Likelihood depth.
The function calculates the misclassification rate based on data depth computed in step (1) using the following classifiers.
- "MaxD": Maximum depth.
- "DD1": Search the best separating polynomial of degree 1.
- "DD2": Search the best separating polynomial of degree 2.
- "DD3": Search the best separating polynomial of degree 3.
- "glm": Logistic regression is computed using Generalized Linear Models classif.glm.
- "gam": Logistic regression is computed using Generalized Additive Models classif.gsam.
- "lda": Linear Discriminant Analysis is computed using lda.
- "qda": Quadratic Discriminant Analysis is computed using qda.
- "knn": k-Nearest Neighbour classification is computed using classif.knn.
- "np": Non-parametric Kernel classifier is computed using classif.np.
The user can be specify the parameters for classifier function in par.classif such as the smoothing parameter par.classif[["h"]], if classif="np" or the k-Nearest Neighbour par.classif[["knn"]], if classif="knn".

In the case of polynomial classifier ("DD1", "DD2" and "DD3") uses the original procedure proposed by Li et al. (2012), by defalut rotating the DD-plot (to exchange abscise and ordinate) using in par.classif argument rotate=TRUE. Notice that the maximum depth classifier can be considered as a particular case of DD1, fixing the slope with a value of 1 (par.classif=list(pol=1)).

The number of possible different polynomials depends on the sample size n and increases polynomially with order \(k\). In the case of \(g\) groups, so the procedure applies some multiple-start optimization scheme to save time:
- generate all combinations of the elements of n taken k at a time: \(g \times combn(N,k)\) candidate solutions, and, when this number is larger than nmax=10000, a random sample of 10000 combinations.
- smooth the empirical loss with the logistic function \(1/(1+e^{-tx})\). The classification rule is constructed optimizing the best noptim combinations in this random sample (by default noptim=1 and tt=50/range(depth values)). Note that Li et al. found that the optimization results become stable for \(t \in [50, 200]\) when the depth is standardized with upper bound 1.
The original procedure (Li et al. (2012)) not need to try many initial polynomials (nmax=1000) and that the procedure optimize the best (noptim=1), but we recommended to repeat the last step for different solutions, as for example nmax=250 and noptim=25. User can change the parameters pol, rotate, nmax, noptim and tt in the argument par.classif.

The classif.DD procedure extends to multi-class problems by incorporating the method of majority voting in the case of polynomial classifier and the method One vs the Rest in the logistic case ("glm" and "gam").

References

Cuesta-Albertos, J.A., Febrero-Bande, M. and Oviedo de la Fuente, M. The DDG-classifier in the functional setting, (2017). Test, 26(1), 119-142. DOI: tools:::Rd_expr_doi("10.1007/s11749-016-0502-6").