predict_contributions.H2OModel: Predict feature contributions - SHAP values on an H2O Model (only DRF, GBM, XGBoost models and equivalent imported MOJOs).

Description

Default implemntation return H2OFrame shape (#rows, #features + 1) - there is a feature contribution column for each input feature, the last column is the model bias (same value for each row). The sum of the feature contributions and the bias term is equal to the raw prediction of the model. Raw prediction of tree-based model is the sum of the predictions of the individual trees before the inverse link function is applied to get the actual prediction. For Gaussian distribution the sum of the contributions is equal to the model prediction.

Usage

predict_contributions.H2OModel(
  object,
  newdata,
  output_format = c("compact", "original"),
  top_n = 0,
  bottom_n = 0,
  compare_abs = FALSE,
  background_frame = NULL,
  output_space = FALSE,
  output_per_reference = FALSE,
  ...
)
h2o.predict_contributions(
  object,
  newdata,
  output_format = c("compact", "original"),
  top_n = 0,
  bottom_n = 0,
  compare_abs = FALSE,
  background_frame = NULL,
  output_space = FALSE,
  output_per_reference = FALSE,
  ...
)

Value

Returns an H2OFrame contain feature contributions for each input row.

Arguments

object: a fitted H2OModel object for which prediction is desired
newdata: An H2OFrame object in which to look for variables with which to predict.
output_format: Specify how to output feature contributions in XGBoost - XGBoost by default outputs contributions for 1-hot encoded features, specifying a compact output format will produce a per-feature contribution. Defaults to original.
top_n: Return only #top_n highest contributions + bias If top_n<0 then sort all SHAP values in descending order If top_n<0 && bottom_n<0 then sort all SHAP values in descending order
bottom_n: Return only #bottom_n lowest contributions + bias If top_n and bottom_n are defined together then return array of #top_n + #bottom_n + bias If bottom_n<0 then sort all SHAP values in ascending order If top_n<0 && bottom_n<0 then sort all SHAP values in descending order
compare_abs: True to compare absolute values of contributions
background_frame: Optional frame, that is used as the source of baselines for the baseline SHAP (when output_per_reference == TRUE) or for the marginal SHAP (when output_per_reference == FALSE).
output_space: If TRUE, linearly scale the contributions so that they sum up to the prediction. NOTE: This will result only in approximate SHAP values even if the model supports exact SHAP calculation. NOTE: This will not have any effect if the estimator doesn't use a link function.
output_per_reference: If TRUE, return baseline SHAP, i.e., contribution for each data point for each reference from the background_frame. If FALSE, return TreeSHAP if no background_frame is provided, or marginal SHAP if background frame is provided. Can be used only with background_frame.
...: additional arguments to pass on.

Details

Note: Multinomial classification models are currently not supported.

Examples

Run this code

if (FALSE) {
library(h2o)
h2o.init()
prostate_path <- system.file("extdata", "prostate.csv", package = "h2o")
prostate <- h2o.uploadFile(path = prostate_path)
prostate_gbm <- h2o.gbm(3:9, "AGE", prostate)
h2o.predict(prostate_gbm, prostate)
# Compute SHAP
h2o.predict_contributions(prostate_gbm, prostate)
# Compute SHAP and pick the top two highest
h2o.predict_contributions(prostate_gbm, prostate, top_n=2)
# Compute SHAP and pick the top two lowest
h2o.predict_contributions(prostate_gbm, prostate, bottom_n=2)
# Compute SHAP and pick the top two highest regardless of the sign
h2o.predict_contributions(prostate_gbm, prostate, top_n=2, compare_abs=TRUE)
# Compute SHAP and pick the top two lowest regardless of the sign
h2o.predict_contributions(prostate_gbm, prostate, bottom_n=2, compare_abs=TRUE)
# Compute SHAP values and show them all in descending order
h2o.predict_contributions(prostate_gbm, prostate, top_n=-1)
# Compute SHAP and pick the top two highest and top two lowest
h2o.predict_contributions(prostate_gbm, prostate, top_n=2, bottom_n=2)

# Compute Marginal SHAP, this enables looking at the contributions against different
# baselines, e.g., older people in the following example
h2o.predict_contributions(prostate_gbm, prostate, background_frame=prostate[prostate$AGE > 75, ])
}