Given a function for computing a metric in metric_func
, these functions
bootstrap the data boot_cut
times and
maximize or minimize the metric by selecting an optimal cutpoint. The returned
optimal cutpoint is the result of applying summary_func
, e.g. the mean,
to all optimal cutpoints that were determined in the bootstrap samples.
The metric
function should accept the following inputs:
tp
: vector of number of true positives
fp
: vector of number of false positives
tn
: vector of number of true negatives
fn
: vector of number of false negatives
maximize_boot_metric(
data,
x,
class,
metric_func = youden,
pos_class = NULL,
neg_class = NULL,
direction,
summary_func = mean,
boot_cut = 50,
boot_stratify,
inf_rm = TRUE,
tol_metric,
use_midpoints,
...
)minimize_boot_metric(
data,
x,
class,
metric_func = youden,
pos_class = NULL,
neg_class = NULL,
direction,
summary_func = mean,
boot_cut = 50,
boot_stratify,
inf_rm = TRUE,
tol_metric,
use_midpoints,
...
)
A data frame or tibble in which the columns that are given in x and class can be found.
(character) The variable name to be used for classification, e.g. predictions or test values.
(character) The variable name indicating class membership.
(function) A function that computes a single number metric to be maximized. See description.
The value of class that indicates the positive class.
The value of class that indicates the negative class.
(character) Use ">=" or "<=" to select whether an x value >= or <= the cutoff predicts the positive class.
(function) After obtaining the bootstrapped optimal cutpoints this function, e.g. mean or median, is applied to arrive at a single cutpoint.
(numeric) Number of bootstrap repetitions over which the mean optimal cutpoint is calculated.
(logical) If the bootstrap is stratified, bootstrap samples are drawn in both classes and then combined, keeping the number of positives and negatives constant in every resample.
(logical) whether to remove infinite cutpoints before calculating the summary.
All cutpoints will be passed to summary_func
that lead to a metric
value in the interval [m_max - tol_metric, m_max + tol_metric] where
m_max is the maximum achievable metric value. This can be used to return
multiple decent cutpoints and to avoid floating-point problems.
(logical) If TRUE (default FALSE) the returned optimal cutpoint will be the mean of the optimal cutpoint and the next highest observation (for direction = ">") or the next lowest observation (for direction = "<") which avoids biasing the optimal cutpoint.
To capture further arguments that are always passed to the method function by cutpointr. The cutpointr function passes data, x, class, metric_func, direction, pos_class and neg_class to the method function.
A tibble with the column optimal_cutpoint
The above inputs are arrived at by using all unique values in x
, Inf, and
-Inf as possible cutpoints for classifying the variable in class.
The reported metric represents the usual in-sample performance of the
determined cutpoint.
Other method functions:
maximize_gam_metric()
,
maximize_loess_metric()
,
maximize_metric()
,
maximize_spline_metric()
,
oc_manual()
,
oc_mean()
,
oc_median()
,
oc_youden_kernel()
,
oc_youden_normal()
# NOT RUN {
set.seed(100)
cutpointr(suicide, dsi, suicide, method = maximize_boot_metric,
metric = accuracy, boot_cut = 30)
set.seed(100)
cutpointr(suicide, dsi, suicide, method = minimize_boot_metric,
metric = abs_d_sens_spec, boot_cut = 30)
# }
Run the code above in your browser using DataLab