gafs
and
safs
functionsMany of these options are the same as those described for
trainControl
. More extensive documentation and examples
can be found on the caret website at
http://topepo.github.io/caret/GA.html#syntax and
http://topepo.github.io/caret/SA.html#syntax.
The functions
component contains the information about how the model
should be fit and summarized. It also contains the elements needed for the
GA and SA modules (e.g. cross-over, etc).
The elements of functions
that are the same for GAs and SAs are:
fit
, with arguments x
, y
, lev
,
last
, and ...
, is used to fit the classification or regression
model
pred
, with arguments object
and x
, predicts
new samples
fitness_intern
, with arguments object
,
x
, y
, maximize
, and p
, summarizes performance
for the internal estimates of fitness
fitness_extern
, with
arguments data
, lev
, and model
, summarizes performance
using the externally held-out samples
selectIter
, with
arguments x
, metric
, and maximize
, determines the best
search iteration for feature selection.
The elements of functions
specific to genetic algorithms are:
initial
, with arguments vars
, popSize
and ...
, creates an initial population.
selection
, with
arguments population
, fitness
, r
, q
, and
...
, conducts selection of individuals.
crossover
, with
arguments population
, fitness
, parents
and ...
,
control genetic reproduction.
mutation
, with arguments
population
, parent
and ...
, adds mutations.
The elements of functions
specific to simulated annealing are:
initial
, with arguments vars
, prob
, and
...
, creates the initial subset.
perturb
, with
arguments x
, vars
, and number
, makes incremental
changes to the subsets.
prob
, with arguments old
,
new
, and iteration
, computes the acceptance probabilities
The pages http://topepo.github.io/caret/GA.html and http://topepo.github.io/caret/SA.html have more details about each of these functions.
holdout
can be used to hold out samples for computing the internal
fitness value. Note that this is independent of the external resampling
step. Suppose 10-fold CV is being used. Within a resampling iteration,
holdout
can be used to sample an additional proportion of the 90%
resampled data to use for estimating fitness. This may not be a good idea
unless you have a very large training set and want to avoid an internal
resampling procedure to estimate fitness.
The search algorithms can be parallelized in several places:
allowParallel
options)
genParallel
)
trainControl
)
It is probably best to pick one of these areas for parallelization and the first is likely to produces the largest decrease in run-time since it is the least likely to incur multiple re-starting of the worker processes. Keep in mind that if multiple levels of parallelization occur, this can effect the number of workers and the amount of memory required exponentially.
gafsControl(functions = NULL, method = "repeatedcv", metric = NULL, maximize = NULL, number = ifelse(grepl("cv", method), 10, 25), repeats = ifelse(grepl("cv", method), 1, 5), verbose = FALSE, returnResamp = "final", p = 0.75, index = NULL, indexOut = NULL, seeds = NULL, holdout = 0, genParallel = FALSE, allowParallel = TRUE)
safsControl(functions = NULL, method = "repeatedcv", metric = NULL, maximize = NULL, number = ifelse(grepl("cv", method), 10, 25), repeats = ifelse(grepl("cv", method), 1, 5), verbose = FALSE, returnResamp = "final", p = 0.75, index = NULL, indexOut = NULL, seeds = NULL, holdout = 0, improve = Inf, allowParallel = TRUE)
boot
, boot632
, cv
,
repeatedcv
, LOOCV
, LGOCV
(for repeated training/test
splits)"internal"
and "external"
. See
gafs
and/or safs
for explanations of the
difference.metric
argument, this this vector should have
names "internal"
and "external"
.index
) that dictates which
sample are held-out for each resample. If NULL
, then the unique set
of samples not contained in index
is used.x
and y
to calculate the internal fitness valuesgafs
use it tp parallelize the fitness calculations within a
generation within a resample?safs
reverts back to the previous optimal subsetsafs
, safs
, , caretGA
,
rfGA
, treebagGA
, caretSA
,
rfSA
, treebagSA