trainControl: Control parameters for train

Description

Control the computational nuances of the train function

Usage

trainControl(method = "boot", 
             number = ifelse(method %in% c("cv", "repeatedcv"), 10, 25),
             repeats = ifelse(method %in% c("cv", "repeatedcv"), 1, number),
             p = 0.75, 
             initialWindow = NULL,
             horizon = 1,
             fixedWindow = TRUE,
             verboseIter = FALSE,
             returnData = TRUE,
             returnResamp = "final",
             savePredictions = FALSE,
             classProbs = FALSE,
             summaryFunction = defaultSummary,
             selectionFunction = "best",
             custom = NULL,
             preProcOptions = list(thresh = 0.95, ICAcomp = 3, k = 5),
             index = NULL,
             indexOut = NULL,
             timingSamps = 0,
             predictionBounds = rep(FALSE, 2),
             seeds = NA,
             allowParallel = TRUE)

Arguments

method

The resampling method: boot, boot632, cv, repeatedcv, LOOCV, LGOCV (for repeated training/test splits), or oob (only for random forest, bagged trees, bagge

number

Either the number of folds or number of resampling iterations

repeats

For repeated k-fold cross-validation only: the number of complete sets of folds to compute

verboseIter

A logical for printing a training log.

returnData

A logical for saving the data

returnResamp

A character string indicating how much of the resampled summary metrics should be saved. Values can be ``final'', ``all'' or ``none''

savePredictions

a logical to save the hold-out predictions for each resample

For leave-group out cross-validation: the training percentage

initialWindow, horizon, fixedWindow

possible arguments to createTimeSlices

classProbs

a logical; should class probabilities be computed for classification models (along with predicted values) in each resample?

summaryFunction

a function to compute performance metrics across resamples. The arguments to the function should be the same as those in defaultSummary.

custom

an optional list of functions that can be used to fit custom models. See the details below and worked examples at http://caret.r-forge.r-project.org/. . This is an "experimental" version for testing. Please send emails to the maintainer for su

selectionFunction

the function used to select the optimal tuning parameter. This can be a name of the function or the function itself. See best for details and other options.

preProcOptions

A list of options to pass to preProcess. The type of pre-processing (e.g. center, scaling etc) is passed in via the preProc option in train.

index

a list with elements for each resampling iteration. Each list element is the sample rows used for training at that iteration.

indexOut

a list (the same length as index) that dictates which sample are held-out for each resample. If NULL, then the unique set of samples not contained in index is used.

timingSamps

the number of training set samples that will be used to measure the time for predicting samples (zero indicates that the prediction time should not be estimated.

predictionBounds

a logical or numeric vector of length 2 (regression only). If logical, the predictions can be constrained to be within the limit of the training set outcomes. For example, a value of c(TRUE, FALSE) would only constrain the lower end of predic

seeds

an optional set of integers that will be used to set the seed at each resampling iteration. This is useful when the models are run in parallel. A value of NA will stop the seed from being set within the worker processes while a value of

allowParallel

if a parallel backend is loaded and available, should the function use it?

Value

An echo of the parameters specified

url

http://caret.r-forge.r-project.org/

code

seeds

Details

For custom modeling functions, several functions can be specified using the custom argument:

parameters

{a data frame or function of tuning parameters} model{a function that trains the model} prediction{a function that predicts new samples (either numbers or character/factor vectors)} probability{an optional function for classification models that returns a matrix or data frame of class probabilities (in columns)} sort{a function that sorts the tuning parameters by complexity}

Examples

Run this code

## Do 5 repeats of 10-Fold CV for the iris data. We will fit
## a KNN model that evaluates 12 values of k and set the seed
## at each iteration.

set.seed(123)
seeds <- vector(mode = "list", length = 51)
for(i in 1:50) seeds[[i]] <- sample.int(1000, 22)

## For the last model:
seeds[[51]] <- sample.int(1000, 1)

ctrl <- trainControl(method = "repeatedcv", 
                     repeats = 5,
                     seeds = seeds)

set.seed(1)
mod <- train(Species ~ ., data = iris, 
             method = "knn", 
             tuneLength = 12,
             trControl = ctrl)

Run the code above in your browser using DataLab