For some learners it is possible to calculate a feature importance measure.
getFeatureImportance
extracts those values from trained models.
See below for a list of supported learners.
getFeatureImportance(object, ...)
(FeatureImportance
) An object containing a data.frame
of the
variable importances and further information.
(WrappedModel)
Wrapped model, result of train()
.
(any)
Additional parameters, which are passed to the underlying importance value
generating function.
boosting
Measure which accounts the gain of Gini index given by a feature
in a tree and the weight of that tree.
cforest
Permutation principle of the 'mean decrease in accuracy' principle in
randomForest. If auc=TRUE
(only for binary classification), area under
the curve is used as measure. The algorithm used for the survival learner
is 'extremely slow and experimental; use at your own risk'. See
party::varimp()
for details and further parameters.
gbm
Estimation of relative influence for each feature. See
gbm::relative.influence()
for details and further parameters.
h2o
Relative feature importances as returned by
h2o::h2o.varimp()
.
randomForest
For type = 2
(the default) the 'MeanDecreaseGini' is measured, which is
based on the Gini impurity index used for the calculation of the nodes.
Alternatively, you can set type
to 1, then the measure is the mean
decrease in accuracy calculated on OOB data. Note, that in this case the
learner's parameter importance
needs to be set to be able to compute
feature importance values.
See randomForest::importance()
for details.
RRF
This is identical to randomForest.
ranger
Supports both measures mentioned above for the randomForest
learner. Note, that you need to specifically set the learners parameter
importance
, to be able to compute feature importance measures.
See ranger::importance()
and
ranger::ranger()
for details.
rpart
Sum of decrease in impurity for each of the surrogate variables at each
node
xgboost
The value implies the relative contribution of the corresponding feature
to the model calculated by taking each feature's contribution for each
tree in the model. The exact computation of the importance in xgboost is
undocumented.