Learn R Programming

mlr (version 2.19.0)

getFeatureImportance: Calculates feature importance values for trained models.

Description

For some learners it is possible to calculate a feature importance measure. getFeatureImportance extracts those values from trained models. See below for a list of supported learners.

Usage

getFeatureImportance(object, ...)

Value

(FeatureImportance) An object containing a data.frame of the variable importances and further information.

Arguments

object

(WrappedModel)
Wrapped model, result of train().

...

(any)
Additional parameters, which are passed to the underlying importance value generating function.

Details

  • boosting
    Measure which accounts the gain of Gini index given by a feature in a tree and the weight of that tree.

  • cforest
    Permutation principle of the 'mean decrease in accuracy' principle in randomForest. If auc=TRUE (only for binary classification), area under the curve is used as measure. The algorithm used for the survival learner is 'extremely slow and experimental; use at your own risk'. See party::varimp() for details and further parameters.

  • gbm
    Estimation of relative influence for each feature. See gbm::relative.influence() for details and further parameters.

  • h2o
    Relative feature importances as returned by h2o::h2o.varimp().

  • randomForest
    For type = 2 (the default) the 'MeanDecreaseGini' is measured, which is based on the Gini impurity index used for the calculation of the nodes. Alternatively, you can set type to 1, then the measure is the mean decrease in accuracy calculated on OOB data. Note, that in this case the learner's parameter importance needs to be set to be able to compute feature importance values. See randomForest::importance() for details.

  • RRF
    This is identical to randomForest.

  • randomForestSRC
    This method can calculate feature importance for various measures. By default the Breiman-Cutler permutation method is used. See randomForestSRC::vimp() for details.

  • ranger
    Supports both measures mentioned above for the randomForest learner. Note, that you need to specifically set the learners parameter importance, to be able to compute feature importance measures. See ranger::importance() and ranger::ranger() for details.

  • rpart
    Sum of decrease in impurity for each of the surrogate variables at each node

  • xgboost
    The value implies the relative contribution of the corresponding feature to the model calculated by taking each feature's contribution for each tree in the model. The exact computation of the importance in xgboost is undocumented.