Calculate measures of relative importance for model predictor variables.
varimp(
object,
method = c("permute", "model"),
scale = TRUE,
sort = c("decreasing", "increasing", "asis"),
...
)
VariableImportance
class object.
model fit result.
character string specifying the calculation of variable
importance as permutation-base ("permute"
) or model-specific
("model"
). If model-specific importance is specified but not
defined, the permutation-based method will be used instead with its default
values (below). Permutation-based variable importance is defined as the
relative change in model predictive performances between datasets with and
without permuted values for the associated variable (Fisher et al. 2019).
logical value or vector indicating whether importance values are scaled to a maximum of 100.
character string specifying the sort order of importance values
to be "decreasing"
, "increasing"
, or as predictors appear in
the model formula ("asis"
).
arguments passed to model-specific or permutation-based variable
importance functions. These include the following arguments and default
values for method = "permute"
.
select = NULL
expression indicating predictor variables for
which to compute variable importance (see subset
for syntax) [default: all].
samples = 1
number of times to permute the values of each variable. Larger numbers of samples decrease variability in the estimates at the expense of increased computation time.
prop = numeric()
proportion of observations to sample without replacement at each round of variable permutations [default: all]. Subsampling of observations can decrease computation time.
size = integer()
number of observations to sample at each round of permutations [default: all].
times = numeric()
numeric vector of follow-up times at
which to predict survival probabilities or NULL
for predicted
survival means.
metric = NULL
metric function or function name with which to calculate performance. If not specified, the first applicable default metric from the performance functions is used.
compare = c("-", "/")
character specifying the relative
change to compute in comparing model predictive performances between
datasets with and without permuted values. The choices are difference
("-"
) and ratio ("/"
).
stats = MachineShop::settings("stat.TrainingParams")
function, function name, or vector of these with which to compute summary statistics on the set of variable importance values from the permuted datasets.
na.rm = TRUE
logical indicating whether to exclude missing variable importance values from the calculation of summary statistics.
progress = TRUE
logical indicating whether to display iterative progress during computation.
The varimp
function supports calculation of variable importance with
the permutation-based method of Fisher et al. (2019) or with model-based
methods where defined. Permutation-based importance is the default and has
the advantages of being available for any model, any performance metric
defined for the associated response variable type, and any predictor variable
in the original training dataset. Conversely, model-specific importance is
not defined for some models and will fall back to the permutation method in
such cases; is generally limited to metrics implemented in the source
packages of models; and may be computed on derived, rather than original,
predictor variables. These disadvantages can make comparisons of
model-specific importance across different classes of models infeasible. A
downside of the permutation-based approach is increased computation time. To
counter this, the permutation algorithm can be run in parallel simply by
loading a parallel backend for the foreach package %dopar%
function, such as doParallel or doSNOW.
Permutation variable importance is interpreted as the contribution of a predictor variable to the predictive performance of a model as measured by the performance metric used in the calculation. Importance of a predictor is conditional on and, with the default scaling, relative to the values of all other predictors in the analysis.
Fisher, A., Rudin, C., & Dominici, F. (2019). All models are wrong, but many are useful: Learning a variable's importance by studying an entire class of prediction models simultaneously. Journal of Machine Learning Research, 20, 1-81.
# \donttest{
## Requires prior installation of suggested package gbm to run
## Survival response example
library(survival)
gbm_fit <- fit(Surv(time, status) ~ ., data = veteran, model = GBMModel)
(vi <- varimp(gbm_fit))
plot(vi)
# }
Run the code above in your browser using DataLab