learningset
(for fixed selected
variables). Note that learningsets
usually do not contain the
complete dataset, so tuning involves a second level of splitting the dataset.
Increasing the number of folds leads to larger datasets (and possibly to higher accuracy),
but also to higher computing times.
For S4 method information, s. link{tune-methods}
tune(X, y, f, learningsets, genesel, genesellist = list(), nbgene, classifier, fold = 3, strat = FALSE, grids = list(), trace = TRUE, ...)
matrix
. Rows correspond to observations, columns to variables.
data.frame
, when f
is not missing (s. below).
ExpressionSet
.
numeric
vector.
factor
.
character
if X
is an ExpressionSet
that
specifies the phenotype variable.
missing
, if X
is a data.frame
and a
proper formula f
is provided.
X
is a data.frame
. The
left part correspond to class labels, the right to variables.learningsets
. May
be missing, then the complete datasets is used as
learning set.genesel
containing variable importance
information for the argument learningsets
genesel
is missing,
this is an argument list passed to GeneSelection
.
If both genesel
and genesellist
are missing,
no variable selection is performed.genesel
or the call to GeneSelection
using genesellist
. In the case that both are missing,
this argument is not necessary.
note:
"lasso", "elasticnet", "boosting"
,
nbgene
will be reset to min(s, nbgene)
where s
is the number of nonzero coefficients.
"one-vs-all", "pairwise"
for the multiclass case, there exist several rankings.
The top nbgene
will be kept of each of them,
so the number of effective used genes will sometimes be much
larger.
CMA
indicating
the classifier to be used.learningset
.
Default is 3. Increasing fold
will lead to higher computing times.FALSE
.k
(the number of nearest neighbours) for knnCMA
,
or cost
for svmCMA
. Each element is a numeric
vector defining the grid of candidate values. Of course, several hyperparameters
can be tuned simultaneously (though requiring much time). By
default, grids
is an empty list. In that case, a pre-defined
list will be used, s. details.TRUE
.classifier
, of course
not one of the arguments to be tuned (!).grids
is an empty list:
gbmCMA
n.trees = c(50, 100, 200, 500, 1000)
compBoostCMA
mstop = c(50, 100, 200, 500, 1000)
LassoCMA
norm.fraction = seq(from=0.1, to=0.9, length=9)
ElasticNetCMA
norm.fraction = seq(from=0.1, to=0.9, length=5), alpha = 2^{-(5:1)}
plrCMA
lambda = 2^{-4:4}
pls_ldaCMA
comp = 1:10
pls_lrCMA
comp = 1:10
pls_rfCMA
comp = 1:10
rfCMA
mtry = ceiling(c(0.1, 0.25, 0.5, 1, 2)*sqrt(ncol(X))), nodesize = c(1,2,3)
knnCMA
k=1:10
pknnCMA
k = 1:10
scdaCMA
delta = c(0.1, 0.25, 0.5, 1, 2, 5)
pnnCMA
sigma = c(2^{-2:2})
nnetCMA
size = 1:5, decay = c(0, 2^{-(4:1)})
svmCMA
, kernel = "linear"
cost = c(0.1, 1, 5, 10, 50, 100, 500)
svmCMA
, kernel = "radial"
cost = c(0.1, 1, 5, 10, 50, 100, 500), gamma = 2^{-2:2}
svmCMA
, kernel = "polynomial"
cost = c(0.1, 1, 5, 10, 50, 100, 500), degree = 2:4
tuningresult
, GeneSelection
, classification
## Not run:
# ### simple example for a one-dimensional grid, using compBoostCMA.
# ### dataset
# data(golub)
# golubY <- golub[,1]
# golubX <- as.matrix(golub[,-1])
# ### learningsets
# set.seed(111)
# lset <- GenerateLearningsets(y=golubY, method = "CV", fold=5, strat =TRUE)
# ### tuning after gene selection with the t.test
# tuneres <- tune(X = golubX, y = golubY, learningsets = lset,
# genesellist = list(method = "t.test"),
# classifier=compBoostCMA, nbgene = 100,
# grids = list(mstop = c(50, 100, 250, 500, 1000)))
# ### inspect results
# show(tuneres)
# best(tuneres)
# plot(tuneres, iter = 3)
# ## End(Not run)
Run the code above in your browser using DataLab