svm: Support Vector Machines

Description

svm is used to train a support vector machine. It can be used to carry out general regression and classification (of nu and epsilon-type), as well as density-estimation. A formula interface is provided.

Usage

## S3 method for class 'formula':
svm(formula, data = NULL, ...)
## S3 method for class 'default':
svm(x, y=NULL, type=NULL, kernel="radial", degree=3, gamma=1/dim(x)[2],
coef0=0, cost=1, nu=0.5, class.weights=NULL, cachesize=40, tolerance=0.001, epsilon=0.5,
shrinking=TRUE, cross=0, ...)

Arguments

formula

a symbolic description of the model to be fit. Note, that an intercept is always included, whether given in the formula or not.

data

an optional data frame containing the variables in the model. By default the variables are taken from the environment which `svm' is called from.

a data matrix or a vector.

a response vector with one label for each row/component of x. Can be either a factor (for classification tasks) or a numeric vector (for regression).

type

svm can be used as a classification machine, as a regresson machine or a density estimator. Depending of whether y is a factor or not, the default setting for svm.type is C-classification or

kernel

the kernel used in training and predicting. You might consider changing some of the following parameters, depending on the kernel type. [object Object],[object Object],[object Object],[object Object]

degree

parameter needed for kernel of type polynomial (default: 3)

gamma

parameter needed for all kernels except linear (default: 1/(data dimension))

coef0

parameter needed for kernels of type polynomial and sigmoid (default: 0)

cost

cost of constraints violation. (default: 1)

parameter needed for nu-classification and one-classification

class.weights

a named vector of weights for the different classes, used for asymetric class sizes. Not all factor levels have to be supplied (default weight: 1). All components have to be named.

cachesize

cache memory in MB. (default 40)

tolerance

tolerance of termination criterion (default: 0.001)

epsilon

epsilon in the insensitive-loss function (default: 0.5)

shrinking

option whether to use the shrinking-heuristics (default: TRUE)

cross

if a integer value k>0 is specified, a k-fold cross validation on the training data is performed to assess the quality of the model: the accuracy rate for classification and the Mean Sqared Error for regression

...

additional parameters for the low level fitting function svm.default.

Value

An object of class "svm" containing the fitted model, especially:
svthe resulting support vectors
indexthe index of the resulting support vectors in the data matrix
coefsthe corresponding coefficiants
(Use summary and print to get some output).

References

Chang, Chih-Chung and Lin, Chih-Jen: LIBSVM 2.0: Solving Different Support Vector Formulations. http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm2.ps.gz
Chang, Chih-Chung and Lin, Chih-Jen: Libsvm: Introduction and Benchmarks http://www.csie.ntu.edu.tw/~cjlin/papers/q2.ps.gz

Examples

Run this code

data(iris)
attach(iris)

## classification mode
# default with factor response:
model <- svm (Species~., data=iris)

# alternatively the traditional interface:
x <- subset (iris, select = -Species)
y <- Species
model <- svm (x, y) 

print (model)
summary (model)

# test with train data
pred <- predict (model, x)

# Check accuracy:
table (pred,y)

## try regression mode on two dimensions

# create data
x <- seq (0.1,5,by=0.05)
y <- log(x) + rnorm (x, sd=0.2)

# estimate model and predict input values
m   <- svm (x,y)
new <- predict (m,x)

# visualize
plot   (x,y)
points (x, log(x), col=2)
points (x, new, col=4)

## density-estimation

# create 2-dim. normal with rho=0:
X <- data.frame (a=rnorm (1000), b=rnorm (1000))
attach (X)

# traditional way:
m <- svm (X)

# formula interface:
m <- svm (~a+b)
# or:
m <- svm (~., data=X)

# test:
predict (m, t(c(0,0)))
predict (m, t(c(4,4)))

# visualization:
plot (X)
points (X[m$index,], col=2)