learningsets
,
this method ranks the genes from the most relevant to the less relevant using
one of various 'filter' criteria or provides a sparse collection of variables
(Lasso, ElasticNet, Boosting). The results are typically used for variable selection for
the classification procedure that follows.
For S4 class information, s. GeneSelection-methods
.
GeneSelection(X, y, f, learningsets, method = c("t.test", "welch.test", "wilcox.test", "f.test", "kruskal.test", "limma", "rfe", "rf", "lasso", "elasticnet", "boosting", "golub", "shrinkcat"), scheme, trace = TRUE, ...)
matrix
. Rows correspond to observations, columns to variables.
data.frame
, when f
is not missing (s. below).
ExpressionSet
.
numeric
vector.
factor
.
character
if X
is an ExpressionSet
.
missing
, if X
is a data.frame
and a
proper formula f
is provided.
X
is a data.frame
. The
left part correspond to class labels, the right to variables.learningsets
. May
be missing, then the complete datasets is used as
learning set.t.test
welch.test
wilcox.test
f.test
method = t.test
in the two-class case.
kruskal.test
limma
limma
.
rfe
e1071
.
Take care that appropriate hyperparameters are passed by the ...
argument.
rf
randomForest
lasso
L1
penalized logistic regression leads to sparsity with respect
to the variables used. Calls the function LassoCMA
, which requires
the package glmpath
.
warning: Take care that appropriate hyperparameters are passed by the ...
argument.
elasticnet
L1
and L2
penalty, claimed
by Zhou and Hastie (2004) to select 'variable groups'. Calls
the function ElasticNetCMA
, which requires the package glmpath
.
warning: Take care that appropriate hyperparameters are passed by the ...
argument.
boosting
compBoostCMA
Take care that appropriate hyperparameters are passed by the ...
argument.
golub
golub
.
shrinkcat
"pairwise"
,"one-vs-all"
or "multiclass"
. The
last case only makes sense if method
is one of f.test, limma, rf, boosting
,
which can directly be applied to the multi class case.TRUE
.method
.Guyon, I., Weston, J., Barnhill, S., Vapnik, V. (2002). Gene Selection for Cancer Classification using support vector machines. Journal of Machine Learning Research, 46, 389-422
Zhou, H., Hastie, T. (2004). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society B, 67(2),301-320
Buelmann, P., Yu, B. (2003). Boosting with the L2 loss: Regression and Classification. Journal of the American Statistical Association, 98, 324-339 Efron, B., Hastie, T., Johnstone, I., Tibshirani, R. (2004). Least Angle Regression. Annals of Statistics, 32:407-499
Buehlmann, P., Yu, B. (2006). Sparse Boosting. Journal of Machine Learning Research, 7- 1001:1024 Slawski, M. Daumer, M. Boulesteix, A.-L. (2008) CMA - A comprehensive Bioconductor package for supervised classification with high dimensional data. BMC Bioinformatics 9: 439
filter
, GenerateLearningsets
, tune
,
classification
# load Golub AML/ALL data
data(golub)
### extract class labels
golubY <- golub[,1]
### extract gene expression from first 10 genes
golubX <- as.matrix(golub[,-1])
### Generate five different learningsets
set.seed(111)
five <- GenerateLearningsets(y=golubY, method = "CV", fold = 5, strat = TRUE)
### simple t-test:
selttest <- GeneSelection(golubX, golubY, learningsets = five, method = "t.test")
### show result:
show(selttest)
toplist(selttest, k = 10, iter = 1)
plot(selttest, iter = 1)
Run the code above in your browser using DataLab