Learn R Programming

h2o (version 3.40.0.4)

h2o.infogram_train_subset_models: Train models over subsets selected using infogram

Description

Train models over subsets selected using infogram

Usage

h2o.infogram_train_subset_models(
  ig,
  model_fun,
  training_frame,
  test_frame,
  y,
  protected_columns,
  reference,
  favorable_class,
  feature_selection_metrics = c("safety_index"),
  metric = "euclidean",
  air_metric = "selectedRatio",
  alpha = 0.05,
  ...
)

Value

frame containing aggregations of intersectional fairness across the models

Arguments

ig

Infogram object trained with the same protected columns

model_fun

Function that creates models. This can be something like h2o.automl, h2o.gbm, etc.

training_frame

Training frame

test_frame

Test frame

y

Response column

protected_columns

Protected columns

reference

List of values corresponding to a reference for each protected columns. If set to NULL, it will use the biggest group as the reference.

favorable_class

Positive/favorable outcome class of the response.

feature_selection_metrics

One or more columns from the infogram@admissible_score.

metric

Metric supported by stats::dist which is used to sort the features.

air_metric

Metric used for Adverse Impact Ratio calculation. Defaults to ``selectedRatio``.

alpha

The alpha level is the probability of rejecting the null hypothesis that the protected group and the reference came from the same population when the null hypothesis is true.

...

Parameters that are passed to the model_fun.

Examples

Run this code
if (FALSE) {
library(h2o)
h2o.connect()
data <- h2o.importFile(paste0("https://s3.amazonaws.com/h2o-public-test-data/smalldata/",
                              "admissibleml_test/taiwan_credit_card_uci.csv"))
x <- c('LIMIT_BAL', 'AGE', 'PAY_0', 'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6', 'BILL_AMT1',
       'BILL_AMT2', 'BILL_AMT3', 'BILL_AMT4', 'BILL_AMT5', 'BILL_AMT6', 'PAY_AMT1', 'PAY_AMT2',
       'PAY_AMT3', 'PAY_AMT4', 'PAY_AMT5', 'PAY_AMT6')
y <- "default payment next month"
protected_columns <- c('SEX', 'EDUCATION')

for (col in c(y, protected_columns))
  data[[col]] <- as.factor(data[[col]])

splits <- h2o.splitFrame(data, 0.8)
train <- splits[[1]]
test <- splits[[2]]
reference <- c(SEX = "1", EDUCATION = "2")  # university educated man
favorable_class <- "0" # no default next month

ig <- h2o.infogram(x, y, train, protected_columns = protected_columns)
print(ig@admissible_score)
plot(ig)

infogram_models <- h2o.infogram_train_subset_models(ig, h2o.gbm, train, test, y,
                                                    protected_columns, reference,
                                                    favorable_class)

pf <- h2o.pareto_front(infogram_models, x_metric = "air_min",
                       y_metric = "AUC", optimum = "top right")
plot(pf)
pf@pareto_front
}

Run the code above in your browser using DataLab