Learn R Programming

vip (version 0.3.2)

vint: Interaction effects

Description

Quantify the strength of two-way interaction effects using a simple feature importance ranking measure (FIRM) approach. For details, see Greenwell et al. (2018).

Usage

vint(
  object,
  feature_names,
  progress = "none",
  parallel = FALSE,
  paropts = NULL,
  ...
)

Arguments

object

A fitted model object (e.g., a "randomForest" object).

feature_names

Character string giving the names of the two features of interest.

progress

Character string giving the name of the progress bar to use while constructing the interaction statistics. See create_progress_bar for details. Default is "none".

parallel

Logical indicating whether or not to run partial in parallel using a backend provided by the foreach package. Default is FALSE.

paropts

List containing additional options to be passed on to foreach when parallel = TRUE.

...

Additional optional arguments to be passed on to partial.

Details

This function quantifies the strength of interaction between features $X_1$ and $X_2$ by measuring the change in variance along slices of the partial dependence of $X_1$ and $X_2$ on the target $Y$. See Greenwell et al. (2018) for details and examples.

References

Greenwell, B. M., Boehmke, B. C., and McCarthy, A. J.: A Simple and Effective Model-Based Variable Importance Measure. arXiv preprint arXiv:1805.04755 (2018).

Examples

Run this code
if (FALSE) {
#
# The Friedman 1 benchmark problem
#

# Load required packages
library(gbm)
library(ggplot2)
library(mlbench)

# Simulate training data
trn <- gen_friedman(500, seed = 101)  # ?vip::gen_friedman

#
# NOTE: The only interaction that actually occurs in the model from which
# these data are generated is between x.1 and x.2!
#

# Fit a GBM to the training data
set.seed(102)  # for reproducibility
fit <- gbm(y ~ ., data = trn, distribution = "gaussian", n.trees = 1000,
           interaction.depth = 2, shrinkage = 0.01, bag.fraction = 0.8,
           cv.folds = 5)
best_iter <- gbm.perf(fit, plot.it = FALSE, method = "cv")

# Quantify relative interaction strength
all_pairs <- combn(paste0("x.", 1:10), m = 2)
res <- NULL
for (i in seq_along(all_pairs)) {
  interact <- vint(fit, feature_names = all_pairs[, i], n.trees = best_iter)
  res <- rbind(res, interact)
}

# Plot top 20 results
top_20 <- res[1L:20L, ]
ggplot(top_20, aes(x = reorder(Variables, Interaction), y = Interaction)) +
  geom_col() +
  coord_flip() +
  xlab("") +
  ylab("Interaction strength")
}

Run the code above in your browser using DataLab