Usage
## S3 method for class 'BioVector':
kbsvm(x, y, kernel = NULL, pkg = "auto",
svm = "C-svc", explicit = "auto", explicitType = "auto",
featureType = "linear", featureWeights = "auto",
weightLimit = .Machine$double.eps, classWeights = numeric(0), cross = 0,
noCross = 1, groupBy = NULL, nestedCross = 0, noNestedCross = 1,
perfParameters = character(0), perfObjective = "ACC", probModel = FALSE,
sel = integer(0), features = NULL, showProgress = FALSE,
showCVTimes = FALSE, runtimeWarning = TRUE,
verbose = getOption("verbose"), ...)## S3 method for class 'XStringSet':
kbsvm(x, y, kernel = NULL, pkg = "auto",
svm = "C-svc", explicit = "auto", explicitType = "auto",
featureType = "linear", featureWeights = "auto",
weightLimit = .Machine$double.eps, classWeights = numeric(0), cross = 0,
noCross = 1, groupBy = NULL, nestedCross = 0, noNestedCross = 1,
perfParameters = character(0), perfObjective = "ACC", probModel = FALSE,
sel = integer(0), features = NULL, showProgress = FALSE,
showCVTimes = FALSE, runtimeWarning = TRUE,
verbose = getOption("verbose"), ...)
## S3 method for class 'ExplicitRepresentation':
kbsvm(x, y, kernel = NULL, pkg = "auto",
svm = "C-svc", explicit = "auto", explicitType = "auto",
featureType = "linear", featureWeights = "auto",
weightLimit = .Machine$double.eps, classWeights = numeric(0), cross = 0,
noCross = 1, groupBy = NULL, nestedCross = 0, noNestedCross = 1,
perfParameters = character(0), perfObjective = "ACC", probModel = FALSE,
sel = integer(0), showProgress = FALSE, showCVTimes = FALSE,
runtimeWarning = TRUE, verbose = getOption("verbose"), ...)
## S3 method for class 'KernelMatrix':
kbsvm(x, y, kernel = NULL, pkg = "auto",
svm = "C-svc", explicit = "no", explicitType = "auto",
featureType = "linear", featureWeights = "no",
classWeights = numeric(0), cross = 0, noCross = 1, groupBy = NULL,
nestedCross = 0, noNestedCross = 1, perfParameters = character(0),
perfObjective = "ACC", probModel = FALSE, sel = integer(0),
showProgress = FALSE, showCVTimes = FALSE, runtimeWarning = TRUE,
verbose = getOption("verbose"), ...)
Arguments
x
multiple biological sequences in the form of a
DNAStringSet
, RNAStringSet
,
AAStringSet
(or as BioVector
). Also
a precomputed kernel matrix (see getKernelMatrix
or a
precomputed explicit representation (see getExRep
can be used
instead. If they were precomputed with a sequence kernel this kernel should
be specified in the parameter kernel
in this case. y
response vector which contains one value for each sample in 'x'.
For classification tasks this can be either a character vector, a factor or
a numeric vector, for regression tasks it must be a numeric vector. For
numeric labels in binary classification the positive class must have the
larger value, for factor or character based labels the positive label must
be at the first position when sorting the labels in descendent order
according to the C locale. If the parameter sel is used to perform training
with a sample subset the response vector must have the same length as 'sel'.
kernel
a sequence kernel object or a string kernel from package
kernlab. In case of grid search or model selection
a list of sequence kernel objects can be passed to training. pkg
name of package which contains the SVM implementation to be used
for training, e.g. kernlab
, e1071
or LiblineaR
. For
gridSearch or model selection multiple packages can be passed as character
vector. (see also parameter svm
below). Default="auto"
svm
name of the SVM used for the classification or regression task,
e.g. "C-svc". For gridSearch or model selection multiple SVMs can be passed
as character vector. For each entry in this character vector a corresponding
entry in the character vector for parameter pkg
is required, if
multiple SVMs are used in one cross validation or model selection run.
explicit
this parameter controls whether training should be performed
with the kernel matrix (see getKernelMatrix
) or explicit
representation (see getExRep
). When the parameter is set to
"no" the kernel matrix is used, for "yes" the model is trained from the
explicit representation. When set to "auto" KeBABS automatically selects a
variant based on runtime heuristics. For training via kernel matrix the
dense LIBSVM implementation included in package kebabs is the preferred
processing variant. Default="auto" explicitType
this parameter is only relevant when parameter
'explicit' is different from "no". The values "sparse" and "dense"
indicate whether a sparse or dense explicit representation should
be used. When the parameter is set to "auto" KeBABS selects a variant.
Default="auto"
featureType
when the parameter is set to "linear" single features
areused in the analysis (with a linear kernel matrix or a linear kernel
applied to the linear explicit representation). When set to "quadratic"
the analysis is based on feature pairs. For an SVM from
LiblineaR (which does not support kernels)
KeBABS generates a quadratic explicit representation. For the other SVMs
a polynomial kernel of degree 2 is used for learning via explicit
representation. In the case of learning via kernel matrix a quadratic
kernel matrix (quadratic here in the sense of linear kernel matrix
with each element taken to power 2) is generated. Default="linear" featureWeights
with the values "no" and "yes" the user can control
whether feature weights are calulated as part of the training. When the
parameter is set to "auto" KeBABS selects a variant (see below).
Default="auto"
weightLimit
the feature weight limit is a single numeric value and
allows pruning of feature weights. All feature weights with an absolute
value below this limit are set to 0 and are not considered in the model and
for further predictions. This parameter is only relevant when featureWeights
are calculated in KeBABS during training.
Default=.Machine$double.eps
classWeights
a numeric named vector of weights for the different
classes, used for asymmetric class sizes. Each element of the vector must
have one of the class names but not all class names must be present.
Default=1
cross
an integer value K > 0 indicates that k-fold cross validation
should be performed. A value -1 is used for Leave-One-Out (LOO) cross
validation. (see above) Default=0
noCross
an integer value larger than 0 is used to specify the number
of repetitions for cross validation. This parameter is only relevant if
'cross' is different from 0. Default=1
groupBy
allows a grouping of samples during cross validation. The
parameter is only relevant when 'cross' is larger than 1. It is an integer
vector or factor with the same length as the number of samples used for
training and specifies for each sample to which group it belongs. Samples
from the same group are never spread over more than one fold. (see
crossValidation
). Grouped cross validation can also be used in
grid search for each grid point. Default=NULL
nestedCross
in integer value K > 0 indicates that a model selection
with nested cross validation should be performed with a k-fold outer cross
validation. The inner cross validation is defined with the 'cross'
parameter (see below), Default=0
noNestedCross
an integer value larger than 0 is used to specify the
number of repetitions for the nested cross validation. This parameter is
only relevant if 'nestedCross' is larger than 0. Default=1
perfParameters
a character vector with one or several values from
the set "ACC" , "BACC", "MCC", "AUC" and "ALL". "ACC" stands for accuracy,
"BACC" for balanced accuracy, "MCC" for Matthews Correlation Coefficient,
"AUC" for area under the ROC curve and "ALL" for all four. This parameter
defines which performance parameters are collected in cross validation,
grid search and model selection for display purpose. The value "AUC" is
currently not supported for multiclass classification. Default=NULL
perfObjective
a singe character string from the set "ACC", "BACC"
and "MCC" (see previous parameter). The parameter is only relevant in
grid search and model selection and defines which performance measure is
used to determine the best performing parameter set. Default="ACC"
probModel
when setting this boolean parameter to TRUE a probability
model is determined as part of the training (see below). Default=FALSE
sel
subset of indices into x
. When this parameter is present
the training is performed for the specified subset of samples only.
Default=integer(0)
features
feature subset of the specified kernel in the form of a
character vector. When a feature subset is passed to the function all other
features in the feature space are not considered for training (see below).
A feature subset can only be used when a single kernel object is specified
in the 'kernel' parameter. Default=NULL
showProgress
when setting this boolean parameter to TRUE the
progress of a cross validation is displayed. The parameter is only relevant
for cross validation. Default=FALSE
showCVTimes
when setting this boolean parameter to TRUE the runtimes
of the cross validation runs are shown after the cross validation is
finished. The parameter is only relevant for cross validation.
Default=FALSE
runtimeWarning
when setting this boolean parameter to FALSE a
warning for long runtimes will not be shown in case of large feature
space dimension or large number of samples. Default=TRUE
verbose
boolean value that indicates whether KeBABS should print
additional messages showing the internal processing logic in a verbose
manner. The default value depends on the R session verbosity option.
Default=getOption("verbose")
...
additional parameters which are passed to SVM training
transparently.