fit: Fit a supervised data mining model (classification or regression) model

Description

Fit a supervised data mining model (classification or regression) model. Wrapper function that allows to fit distinct data mining (14 classification and 15 regression) methods under the same coherent function structure. Also, it tunes the hyperparameters of the models (e.g. kknn, mlpe and ksvm) and performs some feature selection methods.

Usage

fit(x, data = NULL, model = "default", task = "default",  search = "heuristic", mpar = NULL, feature = "none",  scale = "default", transform = "none",  created = NULL, fdebug = FALSE, ...)

Arguments

a symbolic description (formula) of the model to be fit. If data=NULL it is assumed that x contains a formula expression with known variables (see first example below).

data

an optional data frame (columns denote attributes, rows show examples) containing the training data, when using a formula.

model

Typically this should be a character object with the model type name (data mining method, as explained in valid character options). Valid character options are the typical R base learning functions, namely one of:

naive most common class (classification) or mean output value (regression)
ctree -- conditional inference tree (classification and regression, uses ctree from party package)
rpart or dt -- decision tree (classification and regression, uses rpart from rpart package)
kknn or knn -- k-nearest neighbor (classification and regression, uses kknn from kknn package)
mlp -- multilayer perceptron with one hidden layer (classification and regression, uses nnet from nnet package)
mlpe -- multilayer perceptron ensemble (classification and regression, uses nnet from nnet package)
ksvm or svm -- support vector machine (classification and regression, uses ksvm from kernlab package)
randomForest or randomforest -- random forest algorithm (classification and regression, uses randomForest from randomForest package)
bagging -- bagging (classification, uses bagging from adabag package)
boosting -- boosting (classification, uses boosting from adabag package)
lda -- linear discriminant analysis (classification, uses lda from MASS package)
multinom or lr -- logistic regression (classification, uses multinom from nnet package)
naiveBayes or naivebayes -- naive bayes (classification, uses naiveBayes from e1071 package)
qda -- quadratic discriminant analysis (classification, uses qda from MASS package)
mr -- multiple regression (regression, equivalent to lm but uses nnet from nnet package with zero hidden nodes and linear output function)
mars -- multivariate adaptive regression splines (regression, uses mars from mda package)
cubist -- M5 rule-based model (regression, uses cubist from Cubist package)
pcr -- principal component regression (regression, uses pcr from pls package)
plsr -- partial least squares regression (regression, uses plsr from pls package)
cppls -- canonical powered partial least squares (regression, uses cppls from pls package)
rvm -- relevance vector machine (regression, uses rvm from kernlab package)

model can also be a list with the fields (see example below):

$fit -- a fit function that accepts the arguments x, data and ..., the goal is to accept here any R classification or regression model, mainly for its use within the mining or Importance functions, or to use a hyperparameter search (via search).
$predict -- a predict function that accepts the arguments object, newdata, this function should behave as any rminer prediction, i.e., return: a factor when task=="class"; a matrix with Probabilities x Instances when task=="prob"; and a vector when task=="reg".
$name -- optional field with the name of the method.

Note: current rminer version emphasizes the use of native fitting functions from their respective packages, since these functions contain several specific hyperparameters that can now be searched or set using the search or ... arguments. For compatibility with previous rminer versions, older model options are kept.

task

data mining task. Valid options are:

prob (or p) -- classification with output probabilities (i.e. the sum of all outputs equals 1).
class (or c) -- classification with discrete outputs (factor)
reg (or r) -- regression (numeric output)
default tries to guess the best task (prob or reg) given the model and output variable type (if factor then prob else reg)

used to tune hyperparameter(s) of the model, such as: kknn -- number of neighbors (k); mlp or mlpe -- number of hidden nodes (size) or decay; ksvm -- gaussian kernel parameter (sigma); randomForest -- mtry parameter). Valid options for a simpler search use:

heuristic -- simple heuristic, one search parameter (e.g. size=inputs/2 for mlp or size=10 if classification and inputs/2>10, sigma is set using kpar="automatic" and kernel="rbfdot" of ksvm). Important Note: instead of the "heuristic" options, it is advisable to use the explicit mparheuristic function that is designed for a wider option of models (all "heuristic" options were kept due to compatibility issues and work only for: kknn; mlp or mlpe; ksvm, with kernel="rbfdot"; and randomForest).
heuristic5 -- heuristic with a 5 range grid-search (e.g. seq(1,9,2) for kknn, seq(0,8,2) for mlp or mlpe, 2^seq(-15,3,4) for ksvm, 1:5 for randomRorest)
heuristic10 -- heuristic with a 10 range grid-search (e.g. seq(1,10,1) for kknn, seq(0,9,1) for mlp or mlpe, 2^seq(-15,3,2) for ksvm, 1:10 for randomRorest)
UD, UD1 or UD2 -- uniform design 2-Level with 13 (UD or UD2) or 21 (UD1) searches (only works for ksvm and kernel="rbfdot").
a-vector -- numeric vector with all hyperparameter values that will be searched within an internal grid-search (the number of searches is length(search) when convex=0)

A more complex but advised use of search is to use a list with:

$smethod -- type of search method. Valid options are (more options will be developed in next versions):
- none -- no search is executed, one single fit is performed.
- matrix -- matrix search (tests only n searches, all search parameters are of size n).
- grid -- normal grid search (tests all combinations of search parameters).
- 2L - nested 2-Level grid search. First level range is set by $search and then the 2nd level performs a fine tuning, with length($search) searches around (original range/2) best value in first level (2nd level is only performed on numeric searches).
- UD, UD1 or UD2 -- uniform design 2-Level with 13 (UD or UD2) or 21 (UD1) searches (note: only works for model="ksvm" and kernel="rbfdot"). Under this option, $search should contain the first level ranges, such as c(-15,3,-5,15) for classification (gamma min and max, C min and max, after which a 2^ transform is applied) or c(-8,0,-1,6,-8,-1) for regression (last two values are epsilon min and max, after which a 2^ transform is applied).

$search -- a-list with all hyperparameter values to be searched or character with previous described options (e.g. "heuristic", "heuristic5", "UD"). If a character, then $smethod equal to "none" or "grid" or "UD" is automatically assumed.

$convex -- number that defines how many searches are performed after a local minimum/maximum is found (if >0, the search can be stopped without testing all grid-search values)

$method -- type of internal estimation method used during the search (see method argument of mining for details)

$metric -- used to compute a metric value during internal estimation. Can be a single character such as "SAD" or a list with all the arguments used by the mmetric function except y and x, such as search$metric=list(metric="AUC",TC=3,D=0.7). See mmetric for more details.

Note: if mpar argument is used, then the mpar values are automatically fed into search. However, a direct use of the search argument is advised instead of mpar, since search is more flexible and powerful.

mpar

Important note: this argument only is kept in this version due to compatibility with previous rminer versions. Instead of mpar, you should use the more flexible and powerful search argument. vector with extra default (fixed) model parameters (used for modeling, search and feature selection) with:

c(vmethod,vpar,metric) -- generic use of mpar (including most models);
c(C,epsilon,vmethod,vpar,metric) -- if ksvm and C and epsilon are explicitly set;
c(nr,maxit,vmethod,vpar,metric) -- if mlp or mlpe and nr and maxit are explicitly set;

C and epsilon are default values for svm (if any of these is =NA then heuristics are used to set the value). nr is the number of mlp runs or mlpe individual models, while maxit is the maximum number of epochs (if any of these is =NA then heuristics are used to set the value). For help on vmethod and vpar see mining. metric is the internal error function (e.g. used by search to select the best model), valid options are explained in mmetric. When mpar=NULL then default values are used. If there are NA values (e.g. mpar=c(NA,NA)) then default values are used.

feature

feature selection and sensitivity analysis control. Valid fit function options are:

none -- no feature selection;
a fmethod character value, such as sabs (see below);
a-vector -- vector with c(fmethod,deletions,Runs,vmethod,vpar,defaultsearch)
a-vector -- vector with c(fmethod,deletions,Runs,vmethod,vpar)

fmethod sets the type. Valid options are:

sbs -- standard backward selection;
sabs -- sensitivity analysis backward selection (faster);
sabsv -- equal to sabs but uses variance for sensitivity importance measure;
sabsr -- equal to sabs but uses range for sensitivity importance measure;
sabsg -- equal to sabs (uses gradient for sensitivity importance measure);

deletions is the maximum number of feature deletions (if -1 not used). Runs is the number of runs for each feature set evaluation (e.g. 1). For help on vmethod and vpar see mining. defaultsearch is one hyperparameter used during the feature selection search, after selecting the best feature set then search is used (faster). If not defined, then search is used during feature selection (may be slow). When feature is a vector then default values are used to fill missing values or NA values. Note: feature selection capabilities are expected to be enhanced in next rminer versions.

scale

if data needs to be scaled (i.e. for mlp or mlpe). Valid options are:

default -- uses scaling when needed (i.e. for mlp or mlpe)
none -- no scaling;
inputs -- standardizes (0 mean, 1 st. deviation) input attributes;
all -- standardizes (0 mean, 1 st. deviation) input and output attributes;

If needed, the predict function of rminer performs the inverse scaling.

transform

if the output data needs to be transformed (e.g. log transform). Valid options are:

none -- no transform;
log -- y=(log(y+1)) (the inverse function is applied in the predict function);
positive -- all predictions are positive (negative values are turned into zero);
logpositive -- both log and logpositive;

created

time stamp for the model. By default, the system time is used. Else, you can specify another time.

fdebug

if TRUE show some search details.

...

additional and specific parameters send to each fit function model (e.g. dt, randomforest, kernlab). A few examples: -- the rpart function is used for decision trees, thus you can have: control=rpart.control(cp=.05) (see crossvaldata example). -- the ksvm function is used for support vector machines, thus you can change the kernel type: kernel="polydot" (see examples below). Important note: if you use package functions and get an error, then try to explicitly define the package. For instance, you might need to use fit(several-arguments,control=Cubist::cubistControl()) instead of fit(several-arguments,control=cubistControl()).

Value

Returns a model object. You can check all model elements with str(M), where M is a model object. The slots are:

@formula -- the x;
@model -- the model;
@task -- the task;
@mpar -- data.frame with the best model parameters (interpretation depends on model);
@attributes -- the attributes used by the model;
@scale -- the scale;
@transform -- the transform;
@created -- the date when the model was created;
@time -- computation effort to fit the model;
@object -- the R object model (e.g. rpart, nnet, ...);
@outindex -- the output index (of @attributes);
@levels -- if task=="prob"||task=="class" stores the output levels;

Details

Fits a classification or regression model given a data.frame (see [Cortez, 2010] for more details). The ... optional arguments should be used to fix values used by specific model functions (see examples). Notes: - if there is an error in the fit, then a warning is issued (see example). - the new search argument is very flexible and allows a powerful design of supervised learning models. - the search correct use is very dependent on the R learning base functions. For example, if you are tuning model="rpart" then read carefully the help of function rpart. - mpar argument is only kept due to compatibility issues and should be avoided; instead, use the more flexible search.

Details about some models:

Neural Network: mlp trains nr multilayer perceptrons (with maxit epochs, size hidden nodes and decay value according to the nnet function) and selects the best network according to minimum penalized error ($value). mlpe uses an ensemble of nr networks and the final prediction is given by the average of all outputs. To tune mlp or mlpe you can use the search parameter, which performs a grid search for size or decay.

Support Vector Machine: svm adopts by default the gaussian (rbfdot) kernel. For classification tasks, you can use search to tune sigma (gaussian kernel parameter) and C (complexity parameter). For regression, the epsilon insensitive function is adopted and there is an additional hyperparameter epsilon.

Other methods: Random Forest -- if needed, you can tune several parameters, including the default mtry parameter adopted by search heuristics; k-nearest neighbor -- search by default tunes k. The remaining models can also be tunned but a full definition of search is required (e.g. with $smethod, $search and other fields).

References

To check for more details about rminer and for citation purposes: P. Cortez. Data Mining with Neural Networks and Support Vector Machines Using the R/rminer Tool. In P. Perner (Ed.), Advances in Data Mining - Applications and Theoretical Aspects 10th Industrial Conference on Data Mining (ICDM 2010), Lecture Notes in Artificial Intelligence 6171, pp. 572-583, Berlin, Germany, July, 2010. Springer. ISBN: 978-3-642-14399-1. @Springer: http://www.springerlink.com/content/e7u36014r04h0334 http://www3.dsi.uminho.pt/pcortez/2010-rminer.pdf
This tutorial shows additional code examples: P. Cortez. A tutorial on using the rminer R package for data mining tasks. Teaching Report, Department of Information Systems, ALGORITMI Research Centre, Engineering School, University of Minho, Guimaraes, Portugal, July 2015. http://hdl.handle.net/1822/36210
For the grid search and other optimization methods: P. Cortez. Modern Optimization with R. Use R! series, Springer, September 2014, ISBN 978-3-319-08262-2. http://www.springer.com/mathematics/book/978-3-319-08262-2
For the sabs feature selection: P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009. http://dx.doi.org/10.1016/j.dss.2009.05.016
For the uniform design details: C.M. Huang, Y.J. Lee, D.K.J. Lin and S.Y. Huang. Model selection for support vector machines via uniform design, In Computational Statistics & Data Analysis, 52(1):335-346, 2007.

Examples

Run this code

### dontrun is used when the execution of the example requires some computational effort.

### simple regression (with a formula) example.
x1=rnorm(200,100,20); x2=rnorm(200,100,20)
y=0.7*sin(x1/(25*pi))+0.3*sin(x2/(25*pi))
M=fit(y~x1+x2,model="mlpe")
new1=rnorm(100,100,20); new2=rnorm(100,100,20)
ynew=0.7*sin(new1/(25*pi))+0.3*sin(new2/(25*pi))
P=predict(M,data.frame(x1=new1,x2=new2,y=rep(NA,100)))
print(mmetric(ynew,P,"MAE"))

### simple classification example.
## Not run: 
# data(iris)
# M=fit(Species~.,iris,model="rpart")
# plot(M@object); text(M@object) # show model
# P=predict(M,iris)
# print(mmetric(iris$Species,P,"CONF"))
# print(mmetric(iris$Species,P,"ALL"))
# mgraph(iris$Species,P,graph="ROC",TC=2,main="versicolor ROC",
# baseline=TRUE,leg="Versicolor",Grid=10)
# 
# M2=fit(Species~.,iris,model="ctree")
# plot(M2@object) # show model
# P2=predict(M2,iris)
# print(mmetric(iris$Species,P2,"CONF"))
# 
# # ctree with different setup:
# # (ctree_control is from the party package)
# M3=fit(Species~.,iris,model="ctree",controls = party::ctree_control(testtype="MonteCarlo"))
# plot(M3@object) # show model
# ## End(Not run)

### classification example with discrete classes, probabilities and holdout
## Not run: 
# data(iris)
# H=holdout(iris$Species,ratio=2/3)
# M=fit(Species~.,iris[H$tr,],model="ksvm",task="class")
# M2=fit(Species~.,iris[H$tr,],model="ksvm",task="prob")
# P=predict(M,iris[H$ts,])
# P2=predict(M2,iris[H$ts,])
# print(mmetric(iris$Species[H$ts],P,"CONF"))
# print(mmetric(iris$Species[H$ts],P2,"CONF"))
# print(mmetric(iris$Species[H$ts],P,"CONF",TC=1))
# print(mmetric(iris$Species[H$ts],P2,"CONF",TC=1))
# print(mmetric(iris$Species[H$ts],P2,"AUC"))
# 
# ### exploration of some rminer classification models:
# models=c("lda","naiveBayes","kknn","randomForest")
# for(m in models)
#  { cat("model:",m,"\n") 
#    M=fit(Species~.,iris[H$tr,],model=m)
#    P=predict(M,iris[H$ts,])
#    print(mmetric(iris$Species[H$ts],P,"AUC")[[1]])
#  }
# ## End(Not run)

### classification example with hyperparameter selection 
###    note: for regression, similar code can be used
### SVM 
## Not run: 
# data(iris)
# # large list of SVM configurations:
# # SVM with kpar="automatic" sigma rbfdot kernel estimation and default C=1:
# #  note: each execution can lead to different M@mpar due to sigest stochastic nature:
# M=fit(Species~.,iris,model="ksvm")
# print(M@mpar) # model hyperparameters/arguments
# # same thing, explicit use of mparheuristic:
# M=fit(Species~.,iris,model="ksvm",search=list(search=mparheuristic("ksvm")))
# print(M@mpar) # model hyperparameters
# 
# # SVM with C=3, sigma=2^-7
# M=fit(Species~.,iris,model="ksvm",C=3,kpar=list(sigma=2^-7))
# print(M@mpar)
# # SVM with different kernels:
# M=fit(Species~.,iris,model="ksvm",kernel="polydot",kpar="automatic") 
# print(M@mpar)
# # fit already has a scale argument, thus the only way to fix scale of "tanhdot"
# # is to use the special search argument with the "none" method:
# s=list(smethod="none",search=list(scale=2,offset=2))
# M=fit(Species~.,iris,model="ksvm",kernel="tanhdot",search=s) 
# print(M@mpar)
# # heuristic: 10 grid search values for sigma, rbfdot kernel (fdebug is used only for more verbose):
# s=list(search=mparheuristic("ksvm",10)) # advised "heuristic10" usage
# M=fit(Species~.,iris,model="ksvm",search=s,fdebug=TRUE)
# print(M@mpar)
# # same thing, uses older search="heuristic10" that works for fewer rminer models
# M=fit(Species~.,iris,model="ksvm",search="heuristic10",fdebug=TRUE)
# print(M@mpar)
# # identical search under a different and explicit code:
# s=list(search=2^seq(-15,3,2))
# M=fit(Species~.,iris,model="ksvm",search=2^seq(-15,3,2),fdebug=TRUE)
# print(M@mpar)
# 
# # uniform design "UD" for sigma and C, rbfdot kernel, two level of grid searches, 
# # under exponential (2^x) search scale:
# M=fit(Species~.,iris,model="ksvm",search="UD",fdebug=TRUE)
# print(M@mpar)
# M=fit(Species~.,iris,model="ksvm",search="UD1",fdebug=TRUE)
# print(M@mpar)
# M=fit(Species~.,iris,model="ksvm",search=2^seq(-15,3,2),fdebug=TRUE)
# print(M@mpar)
# # now the more powerful search argument is used for modeling SVM:
# # grid 3 x 3 search:
# s=list(smethod="grid",search=list(sigma=2^c(-15,-5,3),C=2^c(-5,0,15)),convex=0,
#             metric="AUC",method=c("kfold",3,12345))
# print(s)
# M=fit(Species~.,iris,model="ksvm",search=s,fdebug=TRUE)
# print(M@mpar)
# # identical search with different argument smethod="matrix" 
# s$smethod="matrix"
# s$search=list(sigma=rep(2^c(-15,-5,3),times=3),C=rep(2^c(-5,0,15),each=3))
# print(s)
# M=fit(Species~.,iris,model="ksvm",search=s,fdebug=TRUE)
# print(M@mpar)
# # search for best kernel (only works for kpar="automatic"):
# s=list(smethod="grid",search=list(kernel=c("rbfdot","laplacedot","polydot","vanilladot")),
#        convex=0,metric="AUC",method=c("kfold",3,12345))
# print(s)
# M=fit(Species~.,iris,model="ksvm",search=s,fdebug=TRUE)
# print(M@mpar)
# # search for best parameters of "rbfdot" or "laplacedot" (which use same kpar):
# s$search=list(kernel=c("rbfdot","laplacedot"),sigma=2^seq(-15,3,5))
# print(s)
# M=fit(Species~.,iris,model="ksvm",search=s,fdebug=TRUE)
# print(M@mpar)
# 
# ### randomForest
# # search for mtry and ntree
# s=list(smethod="grid",search=list(mtry=c(1,2,3),ntree=c(100,200,500)),
#             convex=0,metric="AUC",method=c("kfold",3,12345))
# print(search)
# M=fit(Species~.,iris,model="randomForest",search=s,fdebug=TRUE)
# print(M@mpar)
# 
# ### rpart
# # simpler way to tune cp in 0.01 to 0.9 (10 searches):
# s=list(search=mparheuristic("rpart",n=10,lower=0.01,upper=0.9),method=c("kfold",3,12345))
# M=fit(Species~.,iris,model="rpart",search=s,fdebug=TRUE)
# print(M@mpar)
# 
# # same thing but with more lines of code
# # note: this code can be adapted to tune other rpart parameters,
# #       while mparheuristic only tunes cp
# # a vector list needs to be used for the search$search parameter
# lcp=vector("list",10) # 10 grid values for the complexity cp
# names(lcp)=rep("cp",10) # same cp name 
# scp=seq(0.01,0.9,length.out=10) # 10 values from 0.01 to 0.18
# for(i in 1:10) lcp[[i]]=scp[i] # cycle needed due to [[]] notation
# s=list(smethod="grid",search=list(control=lcp),
#             convex=0,metric="AUC",method=c("kfold",3,12345))
# M=fit(Species~.,iris,model="rpart",search=s,fdebug=TRUE)
# print(M@mpar)
# 
# ### ctree 
# # simpler way to tune mincriterion in 0.1 to 0.98 (9 searches):
# mint=c("kfold",3,123) # internal validation method
# s=list(search=mparheuristic("ctree",n=8,lower=0.1,upper=0.99),method=mint)
# M=fit(Species~.,iris,model="ctree",search=s,fdebug=TRUE)
# print(M@mpar)
# # same thing but with more lines of code
# # note: this code can be adapted to tune other ctree parameters,
# #       while mparheuristic only tunes mincriterion
# # a vector list needs to be used for the search$search parameter
# lmc=vector("list",9) # 9 grid values for the mincriterion
# smc=seq(0.1,0.99,length.out=9)
# for(i in 1:9) lmc[[i]]=party::ctree_control(mincriterion=smc[i]) 
# s=list(smethod="grid",search=list(controls=lmc),method=mint,convex=0)
# M=fit(Species~.,iris,model="ctree",search=s,fdebug=TRUE)
# print(M@mpar)
# 
# ### some MLP fitting examples:
# # simplest use:
# M=fit(Species~.,iris,model="mlpe")  
# print(M@mpar)
# # same thing, with explicit use of mparheuristic:
# M=fit(Species~.,iris,model="mlpe",search=list(search=mparheuristic("mlpe")))
# print(M@mpar)
# 
# print(M@mpar) # hidden nodes and number of ensemble mlps
# # setting some nnet parameters:
# M=fit(Species~.,iris,model="mlpe",size=3,decay=0.1,maxit=100,rang=0.9) 
# print(M@mpar) # mlpe hyperparameters
# # MLP, 5 grid search fdebug is only used to put some verbose in the console:
# s=list(search=mparheuristic("mlpe",n=5)) # 5 searches for size
# print(s) # show search
# M=fit(Species~.,iris,model="mlpe",search=s,fdebug=TRUE)
# print(M@mpar)
# # previous searches used a random holdout (seed=NULL), now a fixed seed (123) is used:
# s=list(smethod="grid",search=mparheuristic("mlpe",n=5),convex=0,metric="AUC",
#             method=c("holdout",2/3,123))
# print(search)
# M=fit(Species~.,iris,model="mlpe",search=s,fdebug=TRUE)
# print(M@mpar)
# # faster and greedy grid search:
# s$convex=1;s$search=list(size=0:9)
# print(search)
# M=fit(Species~.,iris,model="mlpe",search=s,fdebug=TRUE)
# print(M@mpar)
# # 2 level grid with total of 5 searches 
# #  note of caution: some "2L" ranges may lead to non integer (e.g. 1.3) values at
# #  the 2nd level search. And some R functions crash if non integer values are used for
# #  integer parameters.
# s$smethod="2L";s$convex=0;s$search=list(size=c(4,8,12))
# print(s)
# M=fit(Species~.,iris,model="mlpe",search=s,fdebug=TRUE)
# print(M@mpar)
# ## End(Not run)

### example of an error (warning) generated using fit:
## Not run: 
# data(iris)
# # size needs to be a positive integer, thus 0.1 leads to an error:
# M=fit(Species~.,iris,model="mlp",size=0.1)  
# print(M@object)
# ## End(Not run)

### exploration of some rminer regression models:
## Not run: 
# data(sa_ssin)
# H=holdout(sa_ssin$y,ratio=2/3,seed=12345)
# models=c("mr","ctree","mars","cubist","rvm")
# for(m in models)
#  { cat("model:",m,"\n") 
#    M=fit(y~.,sa_ssin[H$tr,],model=m)
#    P=predict(M,sa_ssin[H$ts,])
#    print(mmetric(sa_ssin$y[H$ts],P,"MAE"))
#  }
# ## End(Not run)

### regression example with hyperparameter selection:
## Not run: 
# data(sa_ssin)
# # some SVM experiments:
# # default SVM:
# M=fit(y~.,data=sa_ssin,model="svm")
# print(M@mpar)
# # SVM with (Cherkassy and Ma, 2004) heuristics to set C and epsilon:
# M=fit(y~.,data=sa_ssin,model="svm",C=NA,epsilon=NA)
# print(M@mpar)
# # SVM with Uniform Design set sigma, C and epsilon:
# M=fit(y~.,data=sa_ssin,model="ksvm",search="UD",fdebug=TRUE)
# print(M@mpar)
# 
# # sensitivity analysis feature selection
# M=fit(y~.,data=sa_ssin,model="ksvm",search=list(search=mparheuristic("ksvm",n=5)),feature="sabs") 
# print(M@mpar)
# print(M@attributes) # selected attributes (1, 2 and 3 are the relevant inputs)
# 
# # example that shows how transform works:
# M=fit(y~.,data=sa_ssin,model="mr") # linear regression
# P=predict(M,data.frame(x1=-1000,x2=0,x3=0,x4=0,y=NA)) # P should be negative
# print(P)
# M=fit(y~.,data=sa_ssin,model="mr",transform="positive")
# P=predict(M,data.frame(x1=-1000,x2=0,x3=0,x4=0,y=NA)) # P is not negative
# print(P)
# ## End(Not run)

### pure classification example with a generic R model ###
## Not run: 
# ### nnet is adopted here but virtually ANY fitting function/package could be used:
# 
# # since the default nnet prediction is to provide probabilities, there is
# # a need to create this "wrapping" function:
# predictprob=function(object,newdata)
# { predict(object,newdata,type="class") }
# # list with a fit and predict function:
# # nnet::nnet (package::function)
# model=list(fit=nnet::nnet,predict=predictprob,name="nnet")
# data(iris)
# # note that size is not a fit parameter and it is sent directly to nnet:
# M=fit(Species~.,iris,model=model,size=3,task="class") 
# P=predict(M,iris)
# print(P)
# ## End(Not run)

Run the code above in your browser using DataLab