A wrapper method of backward feature selection in which a given model is fit to nested subsets of most important predictor variables in order to select the subset whose resampled predictive performance is optimal.
rfe(...)# S3 method for formula
rfe(formula, data, model, ...)
# S3 method for matrix
rfe(x, y, model, ...)
# S3 method for ModelFrame
rfe(input, model, ...)
# S3 method for recipe
rfe(input, model, ...)
# S3 method for ModelSpecification
rfe(
object,
select = NULL,
control = MachineShop::settings("control"),
props = 4,
sizes = integer(),
random = FALSE,
recompute = TRUE,
optimize = c("global", "local"),
samples = c(rfe = 1, varimp = 1),
metrics = NULL,
stat = c(resample = MachineShop::settings("stat.Resample"), permute =
MachineShop::settings("stat.TrainingParams")),
progress = FALSE,
...
)
# S3 method for MLModel
rfe(model, ...)
# S3 method for MLModelFunction
rfe(model, ...)
TrainingStep
class object containing a summary of the numbers
of predictor variables retained (size), their names (terms), logical
indicators for the optimal model selected (selected), and associated
performance metrics (metrics).
arguments passed from the generic function to its methods, from
the MLModel
and MLModelFunction
methods to first arguments of
others, and from others to the ModelSpecification
method. The
first argument of each fit
method is positional and, as such, must
be given first in calls to them.
formula defining the model predictor and response variables and a data frame containing them.
model function, function name, or object; or another object that can be coerced to a model. A model can be given first followed by any of the variable specifications.
matrix and object containing predictor and response variables.
input object defining and containing the model predictor and response variables.
model input or specification.
expression indicating predictor variables that can be
eliminated (see subset
for syntax) [default: all].
control function, function name, or object defining the resampling method to be employed.
numeric vector of the proportions of most important predictor
variables to retain in fitted models or an integer number of equal spaced
proportions to generate automatically; ignored if sizes
are given.
integer vector of the set sizes of most important predictor variables to retain.
logical indicating whether to eliminate variables at random with probabilities proportional to their importance.
logical indicating whether to recompute variable importance after eliminating each set of variables.
character string specifying a search through all props
to identify the globally optimal model ("global"
) or a search that
stops after identifying the first locally optimal model ("local"
).
numeric vector or list giving the number of permutation
samples for each of the rfe
and varimp
algorithms.
One or both of the values may be specified as named arguments or in the
order in which their defaults appear. Larger numbers of samples decrease
variability in estimated model performances and variable importances at the
expense of increased computation time. Samples are more expensive
computationally for rfe
than for varimp
.
metric function, function name, or vector of these with which to calculate performance. If not specified, default metrics defined in the performance functions are used.
functions or character strings naming functions to compute summary statistics on resampled metric values and permuted samples. One or both of the values may be specified as named arguments or in the order in which their defaults appear.
logical indicating whether to display iterative progress during elimination.
# \donttest{
## Requires prior installation of suggested package gbm to run
(res <- rfe(sale_amount ~ ., data = ICHomes, model = GBMModel))
summary(res)
summary(performance(res))
plot(res, type = "line")
# }
Run the code above in your browser using DataLab