learningset (for fixed selected
variables). Note that learningsets usually do not contain the
complete dataset, so tuning involves a second level of splitting the dataset.
Increasing the number of folds leads to larger datasets (and possibly to higher accuracy),
but also to higher computing times.
For S4 method information, s. link{tune-methods}tune(X, y, f, learningsets, genesel, genesellist = list(), nbgene, classifier, fold = 3, strat = FALSE, grids = list(), trace = TRUE, ...)matrix. Rows correspond to observations, columns to variables.
data.frame, when f is not missing (s. below).
ExpressionSet.
numeric vector.
factor.
character if X is an ExpressionSet that
specifies the phenotype variable.
missing, if X is a data.frame and a
proper formula f is provided.
X is a data.frame. The
left part correspond to class labels, the right to variables.learningsets. May
be missing, then the complete datasets is used as
learning set.genesel containing variable importance
information for the argument learningsetsgenesel is missing,
this is an argument list passed to GeneSelection.
If both genesel and genesellist are missing,
no variable selection is performed.genesel or the call to GeneSelection
using genesellist. In the case that both are missing,
this argument is not necessary.
note:
"lasso", "elasticnet", "boosting",
nbgene will be reset to min(s, nbgene)
where s is the number of nonzero coefficients.
"one-vs-all", "pairwise"
for the multiclass case, there exist several rankings.
The top nbgene will be kept of each of them,
so the number of effective used genes will sometimes be much
larger.
CMA indicating
the classifier to be used.learningset.
Default is 3. Increasing fold will lead to higher computing times.FALSE.k (the number of nearest neighbours) for knnCMA,
or cost for svmCMA. Each element is a numeric
vector defining the grid of candidate values. Of course, several hyperparameters
can be tuned simultaneously (though requiring much time). By
default, grids is an empty list. In that case, a pre-defined
list will be used, s. details.TRUE.classifier, of course
not one of the arguments to be tuned (!).grids
is an empty list:
gbmCMAn.trees = c(50, 100, 200, 500, 1000)
compBoostCMAmstop = c(50, 100, 200, 500, 1000)
LassoCMAnorm.fraction = seq(from=0.1, to=0.9, length=9)
ElasticNetCMAnorm.fraction = seq(from=0.1, to=0.9, length=5), alpha = 2^{-(5:1)}
plrCMAlambda = 2^{-4:4}
pls_ldaCMAcomp = 1:10
pls_lrCMAcomp = 1:10
pls_rfCMAcomp = 1:10
rfCMAmtry = ceiling(c(0.1, 0.25, 0.5, 1, 2)*sqrt(ncol(X))), nodesize = c(1,2,3)
knnCMAk=1:10
pknnCMAk = 1:10
scdaCMAdelta = c(0.1, 0.25, 0.5, 1, 2, 5)
pnnCMAsigma = c(2^{-2:2})nnetCMAsize = 1:5, decay = c(0, 2^{-(4:1)})
svmCMA, kernel = "linear"cost = c(0.1, 1, 5, 10, 50, 100, 500)
svmCMA, kernel = "radial"cost = c(0.1, 1, 5, 10, 50, 100, 500), gamma = 2^{-2:2}
svmCMA, kernel = "polynomial"cost = c(0.1, 1, 5, 10, 50, 100, 500), degree = 2:4
tuningresult, GeneSelection, classification## Not run:
# ### simple example for a one-dimensional grid, using compBoostCMA.
# ### dataset
# data(golub)
# golubY <- golub[,1]
# golubX <- as.matrix(golub[,-1])
# ### learningsets
# set.seed(111)
# lset <- GenerateLearningsets(y=golubY, method = "CV", fold=5, strat =TRUE)
# ### tuning after gene selection with the t.test
# tuneres <- tune(X = golubX, y = golubY, learningsets = lset,
# genesellist = list(method = "t.test"),
# classifier=compBoostCMA, nbgene = 100,
# grids = list(mstop = c(50, 100, 250, 500, 1000)))
# ### inspect results
# show(tuneres)
# best(tuneres)
# plot(tuneres, iter = 3)
# ## End(Not run)
Run the code above in your browser using DataLab