maximize_boot_metric: Optimize a metric function in binary classification after bootstrapping

Description

Given a function for computing a metric in metric_func, these functions bootstrap the data boot_cut times and maximize or minimize the metric by selecting an optimal cutpoint. The returned optimal cutpoint is the result of applying summary_func, e.g. the mean, to all optimal cutpoints that were determined in the bootstrap samples. The metric function should accept the following inputs:

tp: vector of number of true positives
fp: vector of number of false positives
tn: vector of number of true negatives
fn: vector of number of false negatives

Usage

maximize_boot_metric(
  data,
  x,
  class,
  metric_func = youden,
  pos_class = NULL,
  neg_class = NULL,
  direction,
  summary_func = mean,
  boot_cut = 50,
  boot_stratify,
  inf_rm = TRUE,
  tol_metric,
  use_midpoints,
  ...
)
minimize_boot_metric(
  data,
  x,
  class,
  metric_func = youden,
  pos_class = NULL,
  neg_class = NULL,
  direction,
  summary_func = mean,
  boot_cut = 50,
  boot_stratify,
  inf_rm = TRUE,
  tol_metric,
  use_midpoints,
  ...
)

Value

A tibble with the column optimal_cutpoint

Arguments

data: A data frame or tibble in which the columns that are given in x and class can be found.
x: (character) The variable name to be used for classification, e.g. predictions or test values.
class: (character) The variable name indicating class membership.
metric_func: (function) A function that computes a single number metric to be maximized. See description.
pos_class: The value of class that indicates the positive class.
neg_class: The value of class that indicates the negative class.
direction: (character) Use ">=" or "<=" to select whether an x value >= or <= the cutoff predicts the positive class.
summary_func: (function) After obtaining the bootstrapped optimal cutpoints this function, e.g. mean or median, is applied to arrive at a single cutpoint.
boot_cut: (numeric) Number of bootstrap repetitions over which the mean optimal cutpoint is calculated.
boot_stratify: (logical) If the bootstrap is stratified, bootstrap samples are drawn in both classes and then combined, keeping the number of positives and negatives constant in every resample.
inf_rm: (logical) whether to remove infinite cutpoints before calculating the summary.
tol_metric: All cutpoints will be passed to summary_func that lead to a metric value in the interval [m_max - tol_metric, m_max + tol_metric] where m_max is the maximum achievable metric value. This can be used to return multiple decent cutpoints and to avoid floating-point problems.
use_midpoints: (logical) If TRUE (default FALSE) the returned optimal cutpoint will be the mean of the optimal cutpoint and the next highest observation (for direction = ">") or the next lowest observation (for direction = "<") which avoids biasing the optimal cutpoint.
...: To capture further arguments that are always passed to the method function by cutpointr. The cutpointr function passes data, x, class, metric_func, direction, pos_class and neg_class to the method function.

Details

The above inputs are arrived at by using all unique values in x, Inf, and -Inf as possible cutpoints for classifying the variable in class. The reported metric represents the usual in-sample performance of the determined cutpoint.

Examples

Run this code

set.seed(100)
cutpointr(suicide, dsi, suicide, method = maximize_boot_metric,
          metric = accuracy, boot_cut = 30)
set.seed(100)
cutpointr(suicide, dsi, suicide, method = minimize_boot_metric,
          metric = abs_d_sens_spec, boot_cut = 30)