Kernel-Based Analysis Of Biological Sequences
Description
The package provides functionality for kernel-based analysis of
DNA, RNA, and amino acid sequences via SVM-based methods. As core
functionality, kebabs implements following sequence kernels:
spectrum kernel, mismatch kernel, gappy pair kernel, and
motif kernel. Apart from an efficient implementation of standard
position-independent functionality, the kernels are extended in a
novel way to take the position of patterns into account for the
similarity measure. Because of the flexibility of the kernel
formulation, other kernels like the weighted degree kernel or
the shifted weighted degree kernel with constant weighting of
positions are included as special cases. An annotation-specific
variant of the kernels uses annotation information placed along
the sequence together with the patterns in the sequence.
The package allows for the generation of a kernel matrix or an
explicit feature representation in dense or sparse format for all
available kernels which can be used with methods implemented in
other R packages. With focus on SVM-based methods, kebabs
provides a framework which simplifies the usage of existing
SVM implementations in kernlab, e1071, and LiblineaR. Binary and
multi-class classification as well as regression tasks can be used
in a unified way without having to deal with the different
functions, parameters, and formats of the selected SVM. As support
for choosing hyperparameters, the package provides cross
validation - including grouped cross validation, grid search and
model selection functions. For easier biological interpretation of
the results, the package computes feature weights for all SVMs and
prediction profiles which show the contribution of individual
sequence positions to the prediction result and indicate the
relevance of sequence sections for the learning result and the
underlying biological functions.