generatePartialDependenceData(obj, input, features, interaction = FALSE, derivative = FALSE, individual = FALSE, center = NULL, fun = mean, bounds = c(qnorm(0.025), qnorm(0.975)), resample = "none", fmin, fmax, gridsize = 10L, ...)
WrappedModel
]
Result of train
.data.frame
| Task
]
Input data.character
]
A vector of feature names contained in the training data.
If not specified all features in the input
will be used.logical(1)
]
Whether the features
should be interacted or not. If TRUE
then the Cartesian product of the
prediction grid for each feature is taken, and the partial dependence at each unique combination of
values of the features is estimated. Note that if the length of features
is greater than two,
plotPartialDependence
and plotPartialDependenceGGVIS
cannot be used.
If FALSE
each feature is considered separately. In this case features
can be much longer
than two.
Default is FALSE
.logical(1)
]
Whether or not the partial derivative of the learned function with respect to the features should be
estimated. If TRUE
interaction
must be FALSE
. The partial derivative of individual
observations may be estimated. Note that computation time increases as the learned prediction function
is evaluated at gridsize
points * the number of points required to estimate the partial derivative.
Additional arguments may be passed to grad
(for regression or survival tasks) or
jacobian
(for classification tasks). Note that functions which are not smooth may
result in estimated derivatives of 0 (for points where the function does not change within +/- epsilon)
or estimates trending towards +/- infinity (at discontinuities).
Default is FALSE
.logical(1)
]
Whether to plot the individual conditional expectation curves rather than the aggregated curve, i.e.,
rather than aggregating (using fun
) the partial dependences of features
, plot the
partial dependences of all observations in data
across all values of the features
.
The algorithm is developed in Goldstein, Kapelner, Bleich, and Pitkin (2015).
Default is FALSE
.list
]
A named list containing the fixed values of the features
used to calculate an individual partial dependence which is then
subtracted from each individual partial dependence made across the prediction grid created for the
features
: centering the individual partial dependence lines to make them more interpretable.
This argument is ignored if individual != TRUE
.
Default is NULL
.function
]
For regression, a function that accepts a numeric vector and returns either a single number
such as a measure of location such as the mean, or three numbers, which give a lower bound,
a measure of location, and an upper bound. Note if three numbers are returned they must be
in this order. For classification with predict.type = "prob"
the function must accept
a numeric matrix with the number of columns equal to the number of class levels of the target.
For classification with predict.type = "response"
(the default) the function must accept
a character vector and output a numeric vector with length equal to the number of classes in the
target feature.
The default is the mean, unless obj
is classification with predict.type = "response"
in which case the default is the proportion of observations predicted to be in each class.numeric(2)
]
The value (lower, upper) the estimated standard error is multiplied by to estimate the bound on a
confidence region for a partial dependence. Ignored if predict.type != "se"
for the learner.
Default is the 2.5 and 97.5 quantiles (-1.96, 1.96) of the Gaussian distribution.character(1)
]
Defines how the prediction grid for each feature is created. If bootstrap then
values are sampled with replacement from the training data. If subsample then
values are sampled without replacement from the training data. If none an evenly spaced
grid between either the empirical minimum and maximum, or the minimum and maximum defined by
fmin
and fmax
, is created.
Default is none.numeric
]
The minimum value that each element of features
can take.
This argument is only applicable if resample = NULL
and when the empirical minimum is higher
than the theoretical minimum for a given feature. This only applies to numeric features and a
NA
should be inserted into the vector if the corresponding feature is a factor.
Default is the empirical minimum of each numeric feature and NA for factor features.numeric
]
The maximum value that each element of features
can take.
This argument is only applicable if resample = "none"
and when the empirical maximum is lower
than the theoretical maximum for a given feature. This only applies to numeric features and a
NA
should be inserted into the vector if the corresponding feature is a factor.
Default is the empirical maximum of each numeric feature and NA for factor features.integer(1)
]
The length of the prediction grid created for each feature.
If resample = "bootstrap"
or resample = "subsample"
then this defines
the number of (possibly non-unique) values resampled. If resample = NULL
it defines the
length of the evenly spaced grid created.predict
.PartialDependenceData
]. A named list, which contains the partial dependence,
input data, target, features, task description, and other arguments controlling the type of
partial dependences made.Object members:Friedman, Jerome. Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics. Vol. 29. No. 5 (2001): 1189-1232.
generateCalibrationData
,
generateCritDifferencesData
,
generateFilterValuesData
,
generateFunctionalANOVAData
,
generateLearningCurveData
,
generateThreshVsPerfData
,
getFilterValues
Other partial_dependence: plotPartialDependenceGGVIS
,
plotPartialDependence
lrn = makeLearner("regr.svm")
fit = train(lrn, bh.task)
pd = generatePartialDependenceData(fit, bh.task, "lstat")
plotPartialDependence(pd, data = getTaskData(bh.task))
lrn = makeLearner("classif.rpart", predict.type = "prob")
fit = train(lrn, iris.task)
pd = generatePartialDependenceData(fit, iris.task, "Petal.Width")
plotPartialDependence(pd, data = getTaskData(iris.task))
Run the code above in your browser using DataLab