Estimate how important individual features or groups of features are by contrasting prediction performances. For method “permutation.importance” compute the change in performance from permuting the values of a feature (or a group of features) and compare that to the predictions made on the unmcuted data.
generateFeatureImportanceData(
task,
method = "permutation.importance",
learner,
features = getTaskFeatureNames(task),
interaction = FALSE,
measure,
contrast = function(x, y) x - y,
aggregation = mean,
nmc = 50L,
replace = TRUE,
local = FALSE,
show.info = FALSE
)
(FeatureImportance
). A named list which contains the computed feature importance and the input arguments.
Object members:
(data.frame)
Has columns for each feature or combination of features (colon separated) for which the importance is computed.
A row coresponds to importance of the feature specified in the column for the target.
(logical(1)
)
Whether or not the importance of the features
was computed jointly rather than individually.
(Measure)
The measure used to compute performance.
(function
)
The function used to compare the performance of predictions.
(function
)
The function which is used to aggregate the contrast between the performance of predictions across Monte-Carlo iterations.
(logical(1)
)
Whether or not, when method = "permutation.importance"
, the feature values
are sampled with replacement.
(integer(1)
)
The number of Monte-Carlo iterations used to compute the feature importance.
When nmc == -1
and method = "permutation.importance"
all permutations are used.
(logical(1)
)
Whether observation-specific importance is computed for the features
.
(Task)
The task.
(character(1)
)
The method used to compute the feature importance.
The only method available is “permutation.importance”.
Default is “permutation.importance”.
(Learner | character(1)
)
The learner.
If you pass a string the learner will be created via makeLearner.
(character)
The features to compute the importance of.
The default is all of the features contained in the Task.
(logical(1)
)
Whether to compute the importance of the features
argument jointly.
For method = "permutation.importance"
this entails permuting the values of
all features
together and then contrasting the performance with that of
the performance without the features being permuted.
The default is FALSE
.
(Measure)
Performance measure.
Default is the first measure used in the benchmark experiment.
(function
)
A difference function that takes a numeric vector and returns a numeric vector
of the same length.
The default is element-wise difference between the vectors.
(function
)
A function which aggregates the differences.
This function must take a numeric vector and return a numeric vector of length 1.
The default is mean
.
(integer(1)
)
The number of Monte-Carlo iterations to use in computing the feature importance.
If nmc == -1
and method = "permutation.importance"
then all
permutations of the features
are used.
The default is 50.
(logical(1)
)
Whether or not to sample the feature values with or without replacement.
The default is TRUE
.
(logical(1)
)
Whether to compute the per-observation importance.
The default is FALSE
.
(logical(1)
)
Whether progress output (feature name, time elapsed) should be displayed.
Jerome Friedman; Greedy Function Approximation: A Gradient Boosting Machine, Annals of Statistics, Vol. 29, No. 5 (Oct., 2001), pp. 1189-1232.
Other generate_plot_data:
generateCalibrationData()
,
generateCritDifferencesData()
,
generateFilterValuesData()
,
generateLearningCurveData()
,
generatePartialDependenceData()
,
generateThreshVsPerfData()
,
plotFilterValues()
lrn = makeLearner("classif.rpart", predict.type = "prob")
fit = train(lrn, iris.task)
imp = generateFeatureImportanceData(iris.task, "permutation.importance",
lrn, "Petal.Width", nmc = 10L, local = TRUE)
Run the code above in your browser using DataLab