Learn R Programming

e1071 (version 1.3-9)

svm: Support Vector Machines

Description

svm is used to train a support vector machine. It can be used to carry out general regression and classification (of nu and epsilon-type), as well as density-estimation. A formula interface is provided.

Usage

## S3 method for class 'formula':
svm(formula, data = list(), subset, na.action = na.fail,
...)
## S3 method for class 'default':
svm(x, y = NULL, scale = TRUE, subset, na.action =
na.fail, type = NULL, kernel = "radial", degree = 3, gamma = 1 / dim(x)[2],
coef0 = 0, cost = 1, nu = 0.5, class.weights = NULL, cachesize = 40,
tolerance = 0.001, epsilon = 0.5, shrinking = TRUE, cross = 0, fitted = TRUE, ...)

Arguments

formula
a symbolic description of the model to be fit. Note, that an intercept is always included, whether given in the formula or not.
data
an optional data frame containing the variables in the model. By default the variables are taken from the environment which `svm' is called from.
x
a data matrix, a vector, or a sparse matrix (object of class matrix.csr as provided by the package SparseM).
y
a response vector with one label for each row/component of x. Can be either a factor (for classification tasks) or a numeric vector (for regression).
scale
Per default, data are scaled internally (both x and y variables) to zero mean and unit variance. The center and scale values are returned and used for later predictions.
type
svm can be used as a classification machine, as a regresson machine or a density estimator. Depending of whether y is a factor or not, the default setting for type is C-classification or ep
kernel
the kernel used in training and predicting. You might consider changing some of the following parameters, depending on the kernel type. [object Object],[object Object],[object Object],[object Object]
degree
parameter needed for kernel of type polynomial (default: 3)
gamma
parameter needed for all kernels except linear (default: 1/(data dimension))
coef0
parameter needed for kernels of type polynomial and sigmoid (default: 0)
cost
cost of constraints violation (default: 1)---it is the `C'-constant of the regularization term in the Lagrange formulation.
nu
parameter needed for nu-classification and one-classification
class.weights
a named vector of weights for the different classes, used for asymetric class sizes. Not all factor levels have to be supplied (default weight: 1). All components have to be named.
cachesize
cache memory in MB (default 40)
tolerance
tolerance of termination criterion (default: 0.001)
epsilon
epsilon in the insensitive-loss function (default: 0.5)
shrinking
option whether to use the shrinking-heuristics (default: TRUE)
cross
if a integer value k>0 is specified, a k-fold cross validation on the training data is performed to assess the quality of the model: the accuracy rate for classification and the Mean Sqared Error for regression
fitted
indicates whether the fitted values should be computed and included in the model or not (default: TRUE)
...
additional parameters for the low level fitting function svm.default
subset
An index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.)
na.action
A function to specify the action to be taken if `NA's are found. The default action is for the procedure to fail. An alternative is `na.omit', which leads to rejection of cases with missing values on any required variable.

Value

  • An object of class "svm" containing the fitted model, including:
  • SVthe resulting support vectors (possibly scaled)
  • indexthe index of the resulting support vectors in the data matrix
  • coefsthe corresponding coefficients times the training labels
  • rhothe negative intercept
  • (Use summary and print to get some output).

Details

For multiclass-classification with k levels, k>2, libsvm uses the `one-against-one'-approach, in which k(k-1)/2 binary classifiers are trained; the appropriate class is found by a voting scheme. libsvm internally uses a sparse data representation, which is also high-level supported by the package SparseM. If the predictor variables include factors, the formula interface must be used to get a correct model matrix. plot.svm allows a simple graphical visualization of classification models.

References

  • Chang, Chih-Chung and Lin, Chih-Jen: LIBSVM: a library for Support Vector Machines http://www.csie.ntu.edu.tw/~cjlin/libsvm
  • Exact formulations of models, algorithms, etc. can be found in the document: Chang, Chih-Chung and Lin, Chih-Jen: LIBSVM: a library for Support Vector Machines http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.ps.gz
  • Chang, Chih-Chung and Lin, Chih-Jen: Libsvm: Introduction and Benchmarks http://www.csie.ntu.edu.tw/~cjlin/papers/q2.ps.gz

See Also

predict.svm plot.svm matrix.csr (in package `SparseM')

Examples

Run this code
data(iris)
attach(iris)

## classification mode
# default with factor response:
model <- svm (Species~., data=iris)

# alternatively the traditional interface:
x <- subset (iris, select = -Species)
y <- Species
model <- svm (x, y) 

print (model)
summary (model)

# test with train data
pred <- predict (model, x)

# Check accuracy:
table (pred,y)

## try regression mode on two dimensions

# create data
x <- seq (0.1,5,by=0.05)
y <- log(x) + rnorm (x, sd=0.2)

# estimate model and predict input values
m   <- svm (x,y)
new <- predict (m,x)

# visualize
plot   (x,y)
points (x, log(x), col=2)
points (x, new, col=4)

## density-estimation

# create 2-dim. normal with rho=0:
X <- data.frame (a=rnorm (1000), b=rnorm (1000))
attach (X)

# traditional way:
m <- svm (X, gamma=0.1)

# formula interface:
m <- svm (~., data=X, gamma=0.1)
# or:
m <- svm (~a+b, gamma=0.1)

# test:
newdata <- data.frame(a=c(0,4), b=c(0,4))
newdata
predict (m, newdata)

# visualization:
plot (as.matrix(X))
points (as.matrix(X)[m$index,], col=2)

Run the code above in your browser using DataLab