FADA: Factor Adjusted Discriminant Analysis 3-4 : Supervised classification on decorrelated data

Description

This function performs supervised classification on factor-adjusted data.

Usage

FADA(faobject, K=10,B=20, nbf.cv = NULL,method = c("glmnet", 
    "sda", "sparseLDA"), sda.method = c("lfdr", "HC"), alpha=0.1, ...)

Arguments

faobject

An object returned by function decorrelate.train or decorrelate.test.

Number of folds to estimate classification error rate, only when no testing data is provided. Default is K=10.

Number of replications of the cross-validation. Default is B=20.

nbf.cv

Number of factors for cross validation to compute error rate, only when no testing data is provided. By default, nbf = NULL and the number of factors is estimated for each fold of the cross validation. nbf can also be set to a positive integer value. If nbf = 0, the data are not factor-adjusted.

method

The method used to perform supervised classification model. 3 options are available. If method = "glmnet", a Lasso penalized logistic regression is performed using glmnet R package. If method = "sda", a LDA or DDA (see diagonal argument) is performed using Shrinkage Discriminant Analysis using sda R package. If method = "sparseLDA", a Lasso penalized LDA is performed using SparseLDA R package.

sda.method

The method used for variable selection, only if method="sda". If sda.method="lfdr", variables are selected through CAT scores and False Non Discovery Rate control. If sda.method="HC", the variable selection method is Higher Cristicism Thresholding.

alpha

The proportion of the HC objective to be observed, only if method="sda" and sda.method="HC". Default is 0.1.

...

Some arguments to tune the classification method. See the documentation of the chosen method (glmnet, sda or sda) for more informations about these parameters.

Value

Returns a list with the following elements:

method

Recall of the classification method

selected

A vector containing index of the selected variables

proba.train

A matrix containing predicted group frequencies of training data.

proba.test

A matrix containing predicted group frequencies of testing data, if a testing data set has been provided

predict.test

A matrix containing predicted classes of testing data, if a testing data set has been provided

cv.error

A numeric value containing the average classification error, computed by cross validation, if no testing data set has been provided

cv.error.se

A numeric value containing the standard error of the classification error, computed by cross validation, if no testing data set has been provided

mod

The classification model performed. The class of this element is the class of a model returned by the chosen method. See the documentation of the chosen method for more details.

References

Ahdesmaki, M. and Strimmer, K. (2010), Feature selection in omics prediction problems using cat scores and false non-discovery rate control. Annals of Applied Statistics, 4, 503-519.

Clemmensen, L., Hastie, T. and Witten, D. and Ersboll, B. (2011), Sparse discriminant analysis. Technometrics, 53(4), 406-413.

Friedman, J., Hastie, T. and Tibshirani, R. (2010), Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33, 1-22.

Friguet, C., Kloareg, M. and Causeur, D. (2009), A factor model approach to multiple testing under dependence. Journal of the American Statistical Association, 104:488, 1406-1415.

Perthame, E., Friguet, C. and Causeur, D. (2015), Stability of feature selection in classification issues for high-dimensional correlated data, Statistics and Computing.

Examples

Run this code

# NOT RUN {
data(data.train)
data(data.test)

# When testing data set is provided
res = decorrelate.train(data.train)
res2 = decorrelate.test(res, data.test)
classif = FADA(res2,method="sda",sda.method="lfdr")

### Not run 
# When no testing data set is provided
# Classification error rate is computed by a K-fold cross validation.
# res = decorrelate.train(data.train)
# classif = FADA(res, method="sda",sda.method="lfdr")
# }

Run the code above in your browser using DataLab