Learn R Programming

rtemis (version 0.79)

s.SVM: Support Vector Machines [C, R]

Description

Train an SVM learner using e1071::svm

Usage

s.SVM(x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL,
  y.name = NULL, grid.resample.rtset = rtset.resample("kfold", 5),
  grid.search.type = c("exhaustive", "randomized"),
  grid.randomized.p = 0.1, class.weights = NULL, ipw = TRUE,
  ipw.type = 2, upsample = FALSE, upsample.seed = NULL,
  kernel = "radial", degree = 3, gamma = NULL, coef0 = 0,
  cost = 1, probability = TRUE, metric = NULL, maximize = NULL,
  plot.fitted = NULL, plot.predicted = NULL, print.plot = TRUE,
  plot.theme = getOption("rt.fit.theme", "lightgrid"),
  n.cores = rtCores, question = NULL, rtclass = NULL,
  verbose = TRUE, grid.verbose = TRUE, outdir = NULL,
  save.res = FALSE, osx.alert = FALSE,
  save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ...)

Arguments

x

Numeric vector or matrix / data frame of features i.e. independent variables

y

Numeric vector of outcome, i.e. dependent variable

x.test

Numeric vector or matrix / data frame of testing set features Columns must correspond to columns in x

y.test

Numeric vector of testing set outcome

x.name

Character: Name for feature set

y.name

Character: Name for outcome

grid.resample.rtset

List: Output of rtset.resample defining gridSearchLearn parameters. Default = rtset.resample("kfold", 5)

grid.search.type

String: Type of grid search to perform: "exhaustive" or "randomized". Default = "exhaustive"

grid.randomized.p

Float (0, 1): If grid.search.type = "randomized", randomly run this proportion of combinations. Default = .1

class.weights

Float, length = n levels of outcome: Weights for each outcome class. For classification, class.weights takes precedence over ipw, therefore set class.weights = NULL if using ipw. Default = NULL

ipw

Logical: If TRUE, apply inverse probability weighting (for Classification only). Note: If weights are provided, ipw is not used. Default = TRUE

ipw.type

Integer 0, 1, 2 1: class.weights as in 0, divided by max(class.weights) 2: class.weights as in 0, divided by min(class.weights) Default = 2

upsample

Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Caution: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness

upsample.seed

Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed)

kernel

String: "linear", "polynomial", "radial", "sigmoid"

degree

[gS] Integer: Degree for kernel = "polynomial". Default = 3

gamma

[gS] Float: Parameter used in all kernels except linear

coef0

[gS] Float: Parameter used by kernels polynomial and sigmoid

cost

[gS] Float: Cost of constraints violation; the C constant of the regularization term in the Lagrange formulation.

metric

String: Metric to minimize, or maximize if maximize = TRUE during grid search. Default = NULL, which results in "Balanced Accuracy" for Classification, "MSE" for Regression, and "Coherence" for Survival Analysis.

maximize

Logical: If TRUE, metric will be maximized if grid search is run. Default = FALSE

plot.fitted

Logical: if TRUE, plot True (y) vs Fitted

plot.predicted

Logical: if TRUE, plot True (y.test) vs Predicted. Requires x.test and y.test

print.plot

Logical: if TRUE, produce plot using mplot3 Takes precedence over plot.fitted and plot.predicted

plot.theme

String: "zero", "dark", "box", "darkbox"

n.cores

Integer: Number of cores to use. Defaults to available cores reported by future::availableCores(), unles option rt.cores is set at the time the library is loaded

question

String: the question you are attempting to answer with this model, in plain language.

rtclass

String: Class type to use. "S3", "S4", "RC", "R6"

verbose

Logical: If TRUE, print summary to screen.

grid.verbose

Logical: Passed to gridSearchLearn

outdir

Path to output directory. If defined, will save Predicted vs. True plot, if available, as well as full model output, if save.mod is TRUE

save.mod

Logical. If TRUE, save all output as RDS file in outdir save.mod is TRUE by default if an outdir is defined. If set to TRUE, and no outdir is defined, outdir defaults to paste0("./s.", mod.name)

...

Additional arguments to be passed to e1071::svm

Details

[gS] denotes parameters that will be tuned by cross-validation if more than one value is passed. Regarding SVM tuning, the following guide from the LIBSVM authors can be useful: http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf They suggest searching for cost = 2 ^ seq(-5, 15, 2) and gamma = 2 ^ seq(-15, 3, 2)

See Also

elevate for external cross-validation

Other Supervised Learning: s.ADABOOST, s.ADDTREE, s.BART, s.BAYESGLM, s.BRUTO, s.C50, s.CART, s.CTREE, s.DA, s.ET, s.EVTREE, s.GAM.default, s.GAM.formula, s.GAMSEL, s.GAM, s.GBM3, s.GBM, s.GLMNET, s.GLM, s.GLS, s.H2ODL, s.H2OGBM, s.H2ORF, s.IRF, s.KNN, s.LDA, s.LM, s.MARS, s.MLRF, s.MXN, s.NBAYES, s.NLA, s.NLS, s.NW, s.POLYMARS, s.PPR, s.PPTREE, s.QDA, s.QRNN, s.RANGER, s.RFSRC, s.RF, s.SGD, s.SPLS, s.TFN, s.XGBLIN, s.XGB