Learn R Programming

kebabs (version 1.6.2)

performModelSelection: KeBABS Model Selection

Description

Perform model selection with one or multiple sequence kernels on one or multiple SVMs with one or multiple SVM parameter sets.

Usage

## kbsvm(...., kernel=..., pkg=..., svm=..., cost=..., ....,
##       cross=0, noCross=1, ...., nestedCross=0, noNestedCross=1, ....)

## For details see below. With parameter nestedCross > 1 model selection is ## performed, the other parameters are handled identical to grid search.

Arguments

nestedCross
for this and other parameters see kbsvm

Value

  • model selection stores the results in the KeBABS model. They can be retrieved with the accessor modelSelResult{KBModel}. Results from the outer cross validation are extracted from the model with the accessorcvResult.

Details

Overview Model selection in KeBABS is based on nested k-fold cross validation (CV) (for details see performCrossValidation). The inner cross validation is used to determine the best parameters settings (kernel parameters and SVM parameters) and the outer cross validation to verify the performance on data that was not included in the selection of the best model. The training folds of the outer CV are used to run a grid search with the inner cross validation running for each point of the grid (see performGridSearch to find the best performing model. Once this model is selected the performance of this model on the held out fold of the outer CV is determined. Different model parameters settings could occur for different held out folds of the outer CV. This means that model selection does not deliver a performance estimate for a single best model but for the complete model selection process. For each run of the outer CV KeBABS stores the selected parameter setting for the best performing model. The default performance objective for selecting the best parameters setting is based on minimizing the CV error on the inner CV. With the parameter perfObjective in kbsvm the balanced accuracy or the Matthews correlation coefficient can be used instead for which the parameter setting with the maximal value is selected. The parameter setting of the best performing model for each fold in the outer CV can be retrieved from the KeBABS model with the accessor modelSelResult. The performance values on the outer CV are retrieved from the model with the accessor cvResult. Model selection is invoked through the method kbsvm through setting parameter nestedCross > 1. For the parameters kernel, pkg, svm and SVM hyperparameters the handling is identical to grid search (see performGridSearch). The parameter cost in the usage section above is just one representative of SVM hyperparameters to indicate their relevance for model selection. The complete model selection process can be repeated multiple times through setting noNestedCross to the number of desired repetitions. Nested cross validation used in model selection is dynamically more demanding than grid search. Concerning runtime please see the runtime hints for performGridSearch.

References

http://www.bioinf.jku.at/software/kebabs J. Palme, S. Hochreiter, and U. Bodenhofer (2015) KeBABS: an R package for kernel-based analysis of biological sequences. Bioinformatics, 31(15):2574-2576, 2015. DOI: http://dx.doi.org/10.1093/bioinformatics/btv176{10.1093/bioinformatics/btv176}.

See Also

kbsvm, performGridSearch, modelSelResult, cvResult

Examples

Run this code
## load transcription factor binding site data
data(TFBS)
enhancerFB
## The C-svc implementation from LiblineaR is chosen for most of the
## examples because it is the fastest SVM. With SVMs from other packages
## slightly better results could be achievable. Because of the higher
## runtime needed for nested cross validation please run the examples
## below manually. All samples of the data set are used in the examples.
train <- sample(1:length(enhancerFB), length(enhancerFB))

## model selection with single kernel object and multiple
## hyperparameter values, 5 fold inner CV and 3 fold outer CV
## create gappy pair kernel with normalization
gappyK1M3 <- gappyPairKernel(k=1, m=3)
## show details of single gappy pair kernel object
gappyK1M3

pkg <- "LiblineaR"
svm <- "C-svc"
cost <- c(50,100,150,200,250,300)
model <- kbsvm(x=enhancerFB[train], y=yFB[train], kernel=gappyK1M3,
               pkg=pkg, svm=svm, cost=cost, explicit="yes", cross=3,
               nestedCross=2, showProgress=TRUE)

## show best parameter settings
modelSelResult(model)

## show model selection result which is the result of the outer CV
cvResult(model)
## repeated model selection
pkg <- "LiblineaR"
svm <- "C-svc"
cost <- c(50,100,150,200,250,300)
model <- kbsvm(x=enhancerFB[train], y=yFB[train], kernel=gappyK1M3,
               pkg=pkg, svm=svm, cost=cost, explicit="yes", cross=10,
               nestedCross=3, noNestedCross=3, showProgress=TRUE)

## show best parameter settings
modelSelResult(model)

## show model selection result which is the result of the outer CV
cvResult(model)

## plot CV result
plot(cvResult(model))

Run the code above in your browser using DataLab