Learn R Programming

caret (version 6.0-78)

thresholder: Generate Data to Choose a Probability Threshold

Description

This function uses the resampling results from a train object to generate performance statistics over a set of probability thresholds for two-class problems.

Usage

thresholder(x, threshold, final = TRUE)

Arguments

x

A train object where the values of savePredictions was either TRUE, "all", or "final" in trainControl. Also, the control argument clasProbs should have been TRUE.

threshold

A numeric vector of candidate probability thresholds between [0,1]. If the class probability corresponding to the first level of the outcome is greater than the threshold, the data point is classified as that level.

final

A logical: should only the final tuning parameters chosen by train be used when savePredictions = 'all'?

Value

A data frame with columns for each of the tuning parameters from the model along with an additional column called prob_threshold for the probability threshold. There are also columns for summary statistics averaged over resamples with column names Sensitivity, Specificity, J, Dist. The last two correspond to Youden's J statistic and the distance to the best possible cutoff (i.e. perfect sensitivity and specificity).

Examples

Run this code
# NOT RUN {
set.seed(2444)
dat <- twoClassSim(500, intercept = -10)
table(dat$Class)

ctrl <- trainControl(method = "cv", 
                     classProbs = TRUE,
                     savePredictions = "all",
                     summaryFunction = twoClassSummary)

set.seed(2863)
mod <- train(Class ~ ., data = dat, 
             method = "rda",
             tuneLength = 4,
             metric = "ROC",
             trControl = ctrl)

resample_stats <- thresholder(mod, 
                              threshold = seq(.5, 1, by = 0.05), 
                              final = TRUE)

ggplot(resample_stats, aes(x = prob_threshold, y = J)) + 
  geom_point()
ggplot(resample_stats, aes(x = prob_threshold, y = Dist)) + 
  geom_point()
ggplot(resample_stats, aes(x = prob_threshold, y = Sensitivity)) + 
  geom_point() + 
  geom_point(aes(y = Specificity), col = "red")
# }

Run the code above in your browser using DataLab