rfeControl(functions = NULL, rerank = FALSE, method = "boot", saveDetails = FALSE, number = ifelse(method %in% c("cv", "repeatedcv"), 10, 25), repeats = ifelse(method %in% c("cv", "repeatedcv"), 1, number), verbose = FALSE, returnResamp = "final", p = 0.75, index = NULL, indexOut = NULL, timingSamps = 0, seeds = NA, allowParallel = TRUE)
boot
, cv
,
LOOCV
or LGOCV
(for repeated training/test splitsindex
) that dictates which
sample are held-out for each resample. If NULL
, then the unique set
of samples not contained in index
is used.NA
will stop the seed from being set within the
worker processes while a value of NULL
will set the seeds using a
random set of integers. Alternatively, a list can be used. The list should
have B+1
elements where B
is the number of resamples. The
first B
elements of the list should be vectors of integers of length
P
where P
is the number of subsets being evaluated (including
the full set). The last element of the list only needs to be a single
integer (for the final model). See the Examples section below.Backwards selection requires function to be specified for some operations.
The fit
function builds the model based on the current data set. The
arguments for the function must be:
x
the current
training set of predictor data with the appropriate subset of variables
y
the current outcome data (either a numeric or factor vector)
first
a single logical value for whether the current predictor
set has all possible variables last
similar to first
, but
TRUE
when the last model is fit with the final subset size and
predictors. ...
optional arguments to pass to the fit function
in the call to rfe
The pred
function returns a vector of predictions (numeric or
factors) from the current model. The arguments are:
object
the model generated by the fit
function
x
the current set of predictor set for the held-back samples
The rank
function is used to return the predictors in the order of
the most important to the least important. Inputs are:
object
the model generated by the fit
function
x
the current set of predictor set for the training samples
y
the current training outcomes var
that has the current variable
names. The first row should be the most important predictor etc. Other
columns can be included in the output and will be returned in the final
rfe
object.The selectSize
function determines the optimal number of predictors
based on the resampling output. Inputs for the function are:
x
a matrix with columns for the performance metrics and the
number of variables, called "Variables
" metric
a character
string of the performance measure to optimize (e.g. "RMSE", "Rsquared",
"Accuracy" or "Kappa") maximize
a single logical for whether the
metric should be maximized pickSizeBest
and
pickSizeTolerance
.After the optimal subset size is determined, the selectVar
function
will be used to calculate the best rankings for each variable across all the
resampling iterations. Inputs for the function are:
y
a list of variables importance for each resampling iteration and each subset
size (generated by the user--defined rank
function). In the example,
each each of the cross--validation groups the output of the rank
function is saved for each of the subset sizes (including the original
subset). If the rankings are not recomputed at each iteration, the values
will be the same within each cross-validation iteration. size
the integer returned by the selectSize
function size
)
in the order of most important to least importantExamples of these functions are included in the package:
lmFuncs
, rfFuncs
, treebagFuncs
and
nbFuncs
.
Model details about these functions, including examples, are at http://topepo.github.io/caret/featureselection.html. .
rfe
, lmFuncs
, rfFuncs
,
treebagFuncs
, nbFuncs
,
pickSizeBest
, pickSizeTolerance
## Not run:
# subsetSizes <- c(2, 4, 6, 8)
# set.seed(123)
# seeds <- vector(mode = "list", length = 51)
# for(i in 1:50) seeds[[i]] <- sample.int(1000, length(subsetSizes) + 1)
# seeds[[51]] <- sample.int(1000, 1)
#
# set.seed(1)
# rfMod <- rfe(bbbDescr, logBBB,
# sizes = subsetSizes,
# rfeControl = rfeControl(functions = rfFuncs,
# seeds = seeds,
# number = 50))
# ## End(Not run)
Run the code above in your browser using DataLab