h2o.partialPlot: Partial Dependence Plots

Description

Partial dependence plot gives a graphical depiction of the marginal effect of a variable on the response. The effect of a variable is measured in change in the mean response. Note: Unlike randomForest's partialPlot when plotting partial dependence the mean response (probabilities) is returned rather than the mean of the log class probability.

Usage

h2o.partialPlot(
  object,
  data,
  cols,
  destination_key,
  nbins = 20,
  plot = TRUE,
  plot_stddev = TRUE,
  weight_column = -1,
  include_na = FALSE,
  user_splits = NULL,
  col_pairs_2dpdp = NULL,
  save_to = NULL,
  row_index = -1,
  targets = NULL
)

Value

Plot and list of calculated mean response tables for each feature requested.

Arguments

object: An H2OModel object.
data: An H2OFrame object used for scoring and constructing the plot.
cols: Feature(s) for which partial dependence will be calculated.
destination_key: An key reference to the created partial dependence tables in H2O.
nbins: Number of bins used. For categorical columns make sure the number of bins exceeds the level count. If you enable add_missing_NA, the returned length will be nbin+1.
plot: A logical specifying whether to plot partial dependence table.
plot_stddev: A logical specifying whether to add std err to partial dependence plot.
weight_column: A string denoting which column of data should be used as the weight column.
include_na: A logical specifying whether missing value should be included in the Feature values.
user_splits: A two-level nested list containing user defined split points for pdp plots for each column. If there are two columns using user defined split points, there should be two lists in the nested list. Inside each list, the first element is the column name followed by values defined by the user.
col_pairs_2dpdp: A two-level nested list like this: col_pairs_2dpdp = list(c("col1_name", "col2_name"), c("col1_name","col3_name"), ...,) where a 2D partial plots will be generated for col1_name, col2_name pair, for col1_name, col3_name pair and whatever other pairs that are specified in the nested list.
save_to: Fully qualified prefix of the image files the resulting plots should be saved to, e.g. '/home/user/pdp'. Plots for each feature are saved separately in PNG format, each file receives a suffix equal to the corresponding feature name, e.g. `/home/user/pdp_AGE.png`. If the files already exists, they will be overridden. Files are only saves if plot = TRUE (default).
row_index: Row for which partial dependence will be calculated instead of the whole input frame.
targets: Target classes for multinomial model.

Examples

Run this code

if (FALSE) {
library(h2o)
h2o.init()
prostate_path <- system.file("extdata", "prostate.csv", package = "h2o")
prostate <- h2o.uploadFile(path = prostate_path)
prostate[, "CAPSULE"] <- as.factor(prostate[, "CAPSULE"] )
prostate[, "RACE"] <- as.factor(prostate[, "RACE"] )
prostate_gbm <- h2o.gbm(x = c("AGE", "RACE"),
                        y = "CAPSULE",
                        training_frame = prostate,
                        ntrees = 10,
                        max_depth = 5,
                        learn_rate = 0.1)
h2o.partialPlot(object = prostate_gbm, data = prostate, cols = c("AGE", "RACE"))

iris_hex <- as.h2o(iris)
iris_gbm <- h2o.gbm(x = c(1:4), y = 5, training_frame = iris_hex)

# one target class
h2o.partialPlot(object = iris_gbm, data = iris_hex, cols="Petal.Length", targets=c("setosa"))
# three target classes
h2o.partialPlot(object = iris_gbm, data = iris_hex, cols="Petal.Length", 
                 targets=c("setosa", "virginica", "versicolor"))
}

Run the code above in your browser using DataLab