Partial: Partial Dependence and Individual Conditional Expectation

Description

Partial computes and plots (individual) partial dependence functions of prediction models.

Format

R6Class object.

Usage

pd = Partial$new(predictor, feature, ice = TRUE, aggregation = "pdp", 
    grid.size = 20,  center.at = NULL, run = TRUE)

plot(pd) pd$results print(pd) pd$set.feature(2) pd$center(1)

Arguments

For Partial$new():

predictor:: (Predictor) The object (created with Predictor$new()) holding the machine learning model and the data.
feature:: (`character(1)` | `character(2)` | `numeric(1)` | `numeric(2)`) The feature name or index for which to compute the partial dependencies.
ice:: (`logical(1)`) Should individual curves be calculated? Ignored in the case of two features.
center.at:: (`numeric(1)`) Value at which the plot should be centered. Ignored in the case of two features.
grid.size:: (`numeric(1)` | `numeric(2)`) The size of the grid for evaluating the predictions
run:: (`logical(1)`) Should the Interpretation method be run?

Fields

feature.name:: (`character(1)` | `character(2)`) The names of the features for which the partial dependence was computed.
feature.type:: (`character(1)` | `character(2)`) The detected types of the features, either "categorical" or "numerical".
grid.size:: (`numeric(1)` | `numeric(2)`) The size of the grid.
center.at:: (`numeric(1)` | `character(1)`) The value for the centering of the plot. Numeric for numeric features, and the level name for factors.
n.features:: (`numeric(1)`) The number of features (either 1 or 2)
predictor:: (Predictor) The prediction model that was analysed.
results:: (data.frame) data.frame with the grid of feature of interest and the predicted $\hat{y}$. Can be used for creating custom partial dependence plots.

Methods

center(): method to set the value at which the ice computations are centered. See examples.
set.feature(): method to get/set feature(s) (by index) fpr which to compute pdp. See examples for usage.
plot(): method to plot the partial dependence function. See plot.Partial
run(): [internal] method to run the interpretability method. Use obj$run(force = TRUE) to force a rerun.
clone(): [internal] method to clone the R6 object.
initialize(): [internal] method to initialize the R6 object.

Details

The partial dependence plot calculates and plots the dependence of f(X) on a single or two features. It's the aggregate of all individual conditional expectation curves, that describe how, for a single observation, the prediction changes when the feature changes.

To learn more about partial dependence plot, read the Interpretable Machine Learning book: https://christophm.github.io/interpretable-ml-book/pdp.html

And for individual conditional expectation: https://christophm.github.io/interpretable-ml-book/ice.html

References

Friedman, J.H. 2001. "Greedy Function Approximation: A Gradient Boosting Machine." Annals of Statistics 29: 1189-1232.

Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E. (2013). Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation, 1-22. https://doi.org/10.1080/10618600.2014.907095

Examples

Run this code

# NOT RUN {
# We train a random forest on the Boston dataset:
if (require("randomForest")) {
data("Boston", package  = "MASS")
rf = randomForest(medv ~ ., data = Boston, ntree = 50)
mod = Predictor$new(rf, data = Boston)

# Compute the partial dependence for the first feature
pdp.obj = Partial$new(mod, feature = "crim")

# Plot the results directly
plot(pdp.obj)

# Since the result is a ggplot object, you can extend it: 
if (require("ggplot2")) {
 plot(pdp.obj) + theme_bw()
}

# If you want to do your own thing, just extract the data: 
pdp.dat = pdp.obj$results
head(pdp.dat)

# You can reuse the pdp object for other features: 
pdp.obj$set.feature("lstat")
plot(pdp.obj)

# Only plotting the aggregated partial dependence:  
pdp.obj = Partial$new(mod, feature = "crim", ice = FALSE)
pdp.obj$plot() 

# Only plotting the individual conditional expectation:  
pdp.obj = Partial$new(mod, feature = "crim", aggregation = "none")
pdp.obj$plot() 
  
# Partial dependence plots support up to two features: 
pdp.obj = Partial$new(mod, feature = c("crim", "lstat"))  
plot(pdp.obj)


# Partial dependence plots also works with multiclass classification
rf = randomForest(Species ~ ., data = iris, ntree=50)
mod = Predictor$new(rf, data = iris, type = "prob")

# For some models we have to specify additional arguments for the predict function
plot(Partial$new(mod, feature = "Petal.Width"))

# Partial dependence plots support up to two features: 
pdp.obj = Partial$new(mod, feature = c("Sepal.Length", "Petal.Length"))
pdp.obj$plot()   

# For multiclass classification models, you can choose to only show one class:
mod = Predictor$new(rf, data = iris, type = "prob", class = 1)
plot(Partial$new(mod, feature = "Sepal.Length"))
}
# }

Run the code above in your browser using DataLab