vint: Interaction effects

Description

Quantify the strength of two-way interaction effects using a simple feature importance ranking measure (FIRM) approach. For details, see Greenwell et al. (2018).

Usage

vint(
  object,
  feature_names,
  progress = "none",
  parallel = FALSE,
  paropts = NULL,
  ...
)

Arguments

object: A fitted model object (e.g., a "randomForest" object).
feature_names: Character string giving the names of the two features of interest.
progress: Character string giving the name of the progress bar to use while constructing the interaction statistics. See create_progress_bar for details. Default is "none".
parallel: Logical indicating whether or not to run partial in parallel using a backend provided by the foreach package. Default is FALSE.
paropts: List containing additional options to be passed on to foreach when parallel = TRUE.
...: Additional optional arguments to be passed on to partial.

Details

This function quantifies the strength of interaction between features $X_1$ and $X_2$ by measuring the change in variance along slices of the partial dependence of $X_1$ and $X_2$ on the target $Y$. See Greenwell et al. (2018) for details and examples.

References

Greenwell, B. M., Boehmke, B. C., and McCarthy, A. J.: A Simple and Effective Model-Based Variable Importance Measure. arXiv preprint arXiv:1805.04755 (2018).

Examples

Run this code

if (FALSE) {
#
# The Friedman 1 benchmark problem
#

# Load required packages
library(gbm)
library(ggplot2)
library(mlbench)

# Simulate training data
trn <- gen_friedman(500, seed = 101)  # ?vip::gen_friedman

#
# NOTE: The only interaction that actually occurs in the model from which
# these data are generated is between x.1 and x.2!
#

# Fit a GBM to the training data
set.seed(102)  # for reproducibility
fit <- gbm(y ~ ., data = trn, distribution = "gaussian", n.trees = 1000,
           interaction.depth = 2, shrinkage = 0.01, bag.fraction = 0.8,
           cv.folds = 5)
best_iter <- gbm.perf(fit, plot.it = FALSE, method = "cv")

# Quantify relative interaction strength
all_pairs <- combn(paste0("x.", 1:10), m = 2)
res <- NULL
for (i in seq_along(all_pairs)) {
  interact <- vint(fit, feature_names = all_pairs[, i], n.trees = best_iter)
  res <- rbind(res, interact)
}

# Plot top 20 results
top_20 <- res[1L:20L, ]
ggplot(top_20, aes(x = reorder(Variables, Interaction), y = Interaction)) +
  geom_col() +
  coord_flip() +
  xlab("") +
  ylab("Interaction strength")
}

Run the code above in your browser using DataLab