train can be used to tune models by picking the complexity parameters that are associated with the optimal resampling statistics. For particular model, a grid of parameters (if any) is created and the model is trained on slightly different data for each candidate combination of tuning parameters. Across each data set, the performance of held-out samples is calculated and the mean and standard deviation is summarized for each combination. The combination with the optimal resampling statistic is chosen as the final model and the entire training set is used to fit a final model.A variety of models are currently available. The table below enumerates the models and the values of the method argument, as well as the complexity parameters used by train.
lccc{
Model method Value Package Tuning Parameter(s)
Generalized linear model glm stats none
glmStepAIC MASS none
Generalized additive model gam mgcv select, method
gamLoess gam span, degree
gamSpline gam df
Recursive partitioning rpart rpart cp
rpart2 rpart maxdepth
ctree party mincriterion
ctree2 party maxdepth
Boosted trees gbm gbm interaction depth,
n.trees, shrinkage
blackboost mboost maxdepth, mstop
ada ada maxdepth, iter, nu
bstTree bst maxdepth, mstop, nu
Boosted regression models glmboost mboost mstop
gamboost mboost mstop
logitBoost caTools nIter
bstLs bst mstop, nu
bstSm bst mstop, nu
Random forests rf randomForest mtry
parRF randomForest, foreach mtry
cforest party mtry
Boruta Boruta mtry
Bagging treebag ipred None
bag caret vars
logicBag logicFS ntrees, nleaves
Other Trees nodeHarvest nodeHarvest maxinter, node
partDSA partDSA cut.off.growth, MPD
Logic Regression logreg LogicReg ntrees, treesize
Elastic net (glm) glmnet glmnet alpha, lambda
Neural networks nnet nnet decay, size
neuralnet neuralnet layer1, layer2, layer3
pcaNNet caret decay, size
avNNet caret decay, size, bag
Projection pursuit regression ppr stats nterms
Principal component regression pcr pls ncomp
Independent component regression icr caret n.comp
Partial least squares pls pls, caret ncomp
simpls pls, caret ncomp
widekernelpls pls, caret ncomp
Sparse partial least squares spls spls, caret K, eta, kappa
Support vector machines svmLinear kernlab C
svmRadial kernlab sigma, C
svmRadialCost kernlab C
svmPoly kernlab scale, degree, C
Relevance vector machines rvmLinear kernlab none
rvmRadial kernlab sigma
rvmPoly kernlab scale, degree
Least squares support vector machines lssvmRadial kernlab sigma
Gaussian processes guassprLinearl kernlab none
guassprRadial kernlab sigma
guassprPoly kernlab scale, degree
Linear least squares lm stats None
lmStepAIC MASS None
leapForward leaps nvmax
leapBackward leaps nvmax
leapSeq leaps nvmax
Robust linear regression rlm MASS None
Multivariate adaptive regression splines earth earth degree, nprune
gcvEarth earth degree
Bagged MARS bagEarth caret, earth degree, nprune
Rule Based Regression M5Rules RWeka pruned, smoothed
M5 RWeka pruned, smoothed, rules
cubist Cubist committees, neighbors
Penalized linear models penalized penalized lambda1, lambda2
ridge elasticnet lambda
enet elasticnet lambda, fraction
lars lars fraction
lars2 lars steps
enet elasticnet fraction
foba foba lambda, k
Supervised principal components superpc superpc n.components, threshold
Quantile regression forests qrf quantregForest mtry
Quantile regression neural networks qrnn qrnn n.hidden, penalty, bag
Linear discriminant analysis lda MASS None
Linda rrcov None
Quadratic discriminant analysis qda MASS None
QdaCov rrcov None
Stabilized linear discriminant analysis slda ipred None
Heteroscedastic discriminant analysis hda hda newdim, lambda, gamma
Stepwise discriminant analysis stepLDA klaR maxvar, direction
stepQDA klaR maxvar, direction
Stepwise diagonal discriminant analysis sddaLDA SDDA None
sddaQDA SDDA None
Shrinkage discriminant analysis sda sda diagonal
Sparse linear discriminant analysis sparseLDA sparseLDA NumVars, lambda
Regularized discriminant analysis rda klaR lambda, gamma
Mixture discriminant analysis mda mda subclasses
Sparse mixture discriminant analysis smda sparseLDA NumVars, R, lambda
Penalized discriminant analysis pda mda lambda
pda2 mda df
Stabilised linear discriminant analysis slda ipred None
High dimensional discriminant analysis hdda HDclassif model, threshold
Flexible discriminant analysis (MARS) fda mda, earth degree, nprune
Robust Regularized Linear Discriminant Analysis rrlda rrlda lambda, alpha
Bagged FDA bagFDA caret, earth degree, nprune
Logistic/multinomial regression multinom nnet decay
Penalized logistic regression plr stepPlr lambda, cp
Rule--based classification J48 RWeka C
OneR RWeka None
PART RWeka threshold, pruned
JRip RWeka NumOpt
Logic Forests logforest LogicForest None
Bayesian multinomial probit model vbmpRadial vbmp estimateTheta
k nearest neighbors knn3 caret k
Nearest shrunken centroids pam pamr threshold
scrda rda alpha, delta
Naive Bayes nb klaR usekernel, fL
Generalized partial least squares gpls gpls K.prov
Learned vector quantization lvq class size, k
ROC Curves rocc rocc xgenes
}
By default, the function createGrid is used to define the candidate values of the tuning parameters. The user can also specify their own. To do this, a data fame is created with columns for each tuning parameter in the model. The column names must be the same as those listed in the table above with a leading dot. For example, ncomp would have the column heading .ncomp. This data frame can then be passed to createGrid.
In some cases, models may require control arguments. These can be passed via the three dots argument. Note that some models can specify tuning parameters in the control objects. If specified, these values will be superseded by those given in the createGrid argument.
The vignette entitled "caret Manual -- Model Building" has more details and examples related to this function.
train can be used with "explicit parallelism", where different resamples (e.g. cross-validation group) and models can be split up and run on multiple machines or processors. By default, train will use a single processor on the host machine. As of version 4.99 of this package, the framework used for parallel processing uses the foreach package. To run the resamples in parallel, the code for train does not change; prior to the call to train, a parallel backend is registered with foreach (see the examples below).