generateFeatureImportanceData: Generate feature importance.

Description

Estimate how important individual features or groups of features are by contrasting prediction performances. For method “permutation.importance” compute the change in performance from permuting the values of a feature (or a group of features) and compare that to the predictions made on the unmcuted data.

Usage

generateFeatureImportanceData(task, method = "permutation.importance",
  learner, features = getTaskFeatureNames(task), interaction = FALSE,
  measure, contrast = function(x, y) x - y, aggregation = mean,
  nmc = 50L, replace = TRUE, local = FALSE, show.info = FALSE)

Arguments

task

(Task) The task.

method

(character(1)) The method used to compute the feature importance. The only method available is “permutation.importance”. Default is “permutation.importance”.

learner

(Learner | character(1)) The learner. If you pass a string the learner will be created via makeLearner.

features

(character) The features to compute the importance of. The default is all of the features contained in the Task.

interaction

(logical(1)) Whether to compute the importance of the features argument jointly. For method = "permutation.importance" this entails permuting the values of all features together and then contrasting the performance with that of the performance without the features being permuted. The default is FALSE.

measure

(Measure) Performance measure. Default is the first measure used in the benchmark experiment.

contrast

(function) A difference function that takes a numeric vector and returns a numeric vector of the same length. The default is element-wise difference between the vectors.

aggregation

(function) A function which aggregates the differences. This function must take a numeric vector and return a numeric vector of length 1. The default is mean.

nmc

(integer(1)) The number of Monte-Carlo iterations to use in computing the feature importance. If nmc == -1 and method = "permutation.importance" then all permutations of the features are used. The default is 50.

replace

(logical(1)) Whether or not to sample the feature values with or without replacement. The default is TRUE.

local

(logical(1)) Whether to compute the per-observation importance. The default is FALSE.

show.info

(logical(1)) Whether progress output (feature name, time elapsed) should be displayed.

Value

(FeatureImportance). A named list which contains the computed feature importance and the input arguments.

Object members:

res

(data.frame) Has columns for each feature or combination of features (colon separated) for which the importance is computed. A row coresponds to importance of the feature specified in the column for the target.

interaction

(logical(1)) Whether or not the importance of the features was computed jointly rather than individually.

measure

(Measure)

The measure used to compute performance.

contrast

(function) The function used to compare the performance of predictions.

aggregation

(function) The function which is used to aggregate the contrast between the performance of predictions across Monte-Carlo iterations.

replace

(logical(1)) Whether or not, when method = "permutation.importance", the feature values are sampled with replacement.

nmc

(integer(1)) The number of Monte-Carlo iterations used to compute the feature importance. When nmc == -1 and method = "permutation.importance" all permutations are used.

local

(logical(1)) Whether observation-specific importance is computed for the features.

References

Jerome Friedman; Greedy Function Approximation: A Gradient Boosting Machine, Annals of Statistics, Vol. 29, No. 5 (Oct., 2001), pp. 1189-1232.

Examples

Run this code

# NOT RUN {
lrn = makeLearner("classif.rpart", predict.type = "prob")
fit = train(lrn, iris.task)
imp = generateFeatureImportanceData(iris.task, "permutation.importance",
  lrn, "Petal.Width", nmc = 10L, local = TRUE)
# }

Run the code above in your browser using DataLab