model_utility: Estimate model utility

Description

Compute the utility of a model score on a classification data set. For each threshold of interest we compute the utility of the classification rule of taking all items with model score greater than or equal to the threshold. The user specifies the outcome (a binary classification target), a model score (numeric), and the utility values (positive, negative, or zero) of each case: true positives, false positives, true negatives, and false negatives. What is returned is a table of model thresholds and the total value of using this model score plus the given threshold as a classification rule. NA is used to mark a threshold where no rows are selected.

Usage

model_utility(
  d,
  model_name,
  outcome_name,
  ...,
  outcome_target = TRUE,
  true_positive_value_column_name = "true_positive_value",
  false_positive_value_column_name = "false_positive_value",
  true_negative_value_column_name = "true_negative_value",
  false_negative_value_column_name = "false_negative_value"
)

Value

data.frame of all threshold values.

Arguments

d: A data.frame containing all data and outcome values.
model_name: Name of the column containing model predictions.
outcome_name: Name of the column containing the truth values.
...: Not used, forces later argument to be specified by name.
outcome_target: truth value considered to be TRUE.
true_positive_value_column_name: column name of per-row values of true positive cases. Only used on positive instances.
false_positive_value_column_name: column name of per-row values of false positive cases. Only used on negative instances.
true_negative_value_column_name: column name of per-row values of true negative cases. Only used on negative instances.
false_negative_value_column_name: column name of per-row values of false negative cases. Only used on positive instances.

Details

A worked example can be found here: https://github.com/WinVector/sigr/blob/main/extras/UtilityExample.md.

Examples

Run this code


d <- data.frame(
  predicted_probability = c(0, 0.5, 0.5, 0.5),
  made_purchase = c(FALSE, TRUE, FALSE, FALSE),
  false_positive_value = -5,    # acting on any predicted positive costs $5
  true_positive_value = 95,     # revenue on a true positive is $100 minus action cost
  true_negative_value = 0.001,  # true negatives have no value in our application
                                # but just give ourselves a small reward for being right
  false_negative_value = -0.01  # adding a small notional tax for false negatives,
                                # don't want our competitor getting these accounts.
  )

values <- model_utility(d, 'predicted_probability', 'made_purchase')
best_strategy <- values[values$total_value >= max(values$total_value), ][1, ]
t(best_strategy)



# a bigger example

d <- data.frame(
  predicted_probability = stats::runif(100),
  made_purchase = sample(c(FALSE, TRUE), replace = TRUE, size = 100),
  false_positive_value = -5,    # acting on any predicted positive costs $5
  true_positive_value = 95,     # revenue on a true positive is $100 minus action cost
  true_negative_value = 0.001,  # true negatives have no value in our application
                                # but just give ourselves a small reward for being right
  false_negative_value = -0.01  # adding a small notional tax for false negatives,
  # don't want our competitor getting these accounts.
)

values <- model_utility(d, 'predicted_probability', 'made_purchase')

# plot the estimated total utility as a function of threshold
plot(values$threshold, values$total_value)

best_strategy <- values[values$total_value >= max(values$total_value), ][1, ]
t(best_strategy)


# without utilities example

d <- data.frame(
  predicted_probability = c(0, 0.5, 0.5, 0.5),
  made_purchase = c(FALSE, TRUE, FALSE, FALSE))
model_utility(d, 'predicted_probability', 'made_purchase')

Run the code above in your browser using DataLab