This function performs supervised classification on factor-adjusted data.
FADA(faobject, K=10,B=20, nbf.cv = NULL,method = c("glmnet",
"sda", "sparseLDA"), sda.method = c("lfdr", "HC"), alpha=0.1, ...)
An object returned by function decorrelate.train
or decorrelate.test
.
Number of folds to estimate classification error rate, only when no testing data is provided. Default is K=10
.
Number of replications of the cross-validation. Default is B=20
.
Number of factors for cross validation to compute error rate, only when no testing data is provided. By default, nbf = NULL
and the number of factors is estimated for each fold of the cross validation. nbf
can
also be set to a positive integer value. If nbf = 0
, the data are not factor-adjusted.
The method used to perform supervised classification model. 3 options are available. If
method = "glmnet"
, a Lasso penalized logistic regression is performed using glmnet R package.
If method = "sda"
, a LDA or DDA (see diagonal
argument) is performed using Shrinkage Discriminant
Analysis using sda R package. If method = "sparseLDA"
, a Lasso penalized LDA is performed using
SparseLDA R package.
The method used for variable selection, only if method="sda"
. If sda.method="lfdr"
,
variables are selected through CAT scores and False Non Discovery Rate control. If sda.method="HC", the variable selection
method is Higher Cristicism Thresholding.
The proportion of the HC objective to be observed, only if method="sda" and sda.method="HC". Default is 0.1.
Returns a list with the following elements:
Recall of the classification method
A vector containing index of the selected variables
A matrix containing predicted group frequencies of training data.
A matrix containing predicted group frequencies of testing data, if a testing data set has been provided
A matrix containing predicted classes of testing data, if a testing data set has been provided
A numeric value containing the average classification error, computed by cross validation, if no testing data set has been provided
A numeric value containing the standard error of the classification error, computed by cross validation, if no testing data set has been provided
The classification model performed. The class of this element is the class of a model returned by the chosen method. See the documentation of the chosen method for more details.
Ahdesmaki, M. and Strimmer, K. (2010), Feature selection in omics prediction problems using cat scores and false non-discovery rate control. Annals of Applied Statistics, 4, 503-519.
Clemmensen, L., Hastie, T. and Witten, D. and Ersboll, B. (2011), Sparse discriminant analysis. Technometrics, 53(4), 406-413.
Friedman, J., Hastie, T. and Tibshirani, R. (2010), Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33, 1-22.
Friguet, C., Kloareg, M. and Causeur, D. (2009), A factor model approach to multiple testing under dependence. Journal of the American Statistical Association, 104:488, 1406-1415.
Perthame, E., Friguet, C. and Causeur, D. (2015), Stability of feature selection in classification issues for high-dimensional correlated data, Statistics and Computing.
FADA
, decorrelate.train
, decorrelate.test
, sda
, sda-package
,
glmnet-package
# NOT RUN {
data(data.train)
data(data.test)
# When testing data set is provided
res = decorrelate.train(data.train)
res2 = decorrelate.test(res, data.test)
classif = FADA(res2,method="sda",sda.method="lfdr")
### Not run
# When no testing data set is provided
# Classification error rate is computed by a K-fold cross validation.
# res = decorrelate.train(data.train)
# classif = FADA(res, method="sda",sda.method="lfdr")
# }
Run the code above in your browser using DataLab