Learn R Programming

qwraps2 (version 0.5.2)

confusion_matrix: Confusion Matrices (Contingency Tables)

Description

Construction of confusion matrices, accuracy, sensitivity, specificity, confidence intervals (Wilson's method and (optional bootstrapping)).

Usage

confusion_matrix(x, ...)

# S3 method for default confusion_matrix( x, y, positive = NULL, boot = FALSE, boot_samples = 1000L, alpha = 0.05, ... )

# S3 method for formula confusion_matrix( formula, data = parent.frame(), positive = NULL, boot = FALSE, boot_samples = 1000L, alpha = 0.05, ... )

is.confusion_matrix(x)

# S3 method for confusion_matrix print(x, ...)

Arguments

x

prediction condition vector, a two level factor variable or a variable that can be converted to one.

...

not currently used

y

True Condition vector with the same possible values as x.

positive

the level of x and y which is the positive outcome. If NULL the first level of factor(y) will be used as the positive level.

boot

boolean, should bootstrapped confidence intervals for the sensitivity and specificity be computed? Defaults to FALSE.

boot_samples

number of bootstrapping sample to generate, defaults to 1000L. Ignored if boot == FALSE.

alpha

100(1-alpha) sensitivity. Ignored if boot == FALSE.

formula

column (known) ~ row (test) for building the confusion matrix

data

environment containing the variables listed in the formula

Value

The sensitivity and specificity functions return numeric values. confusion_matrix returns a list with elements:

  • tab the confusion matrix,

  • cells

  • stats a matrix of summary statistics and confidence intervals.

Details

Sensitivity and Specificity: For the sensitivity and specificity function we expect the 2-by-2 confusion matrix (contingency table) to be of the form:

True Condition
+ -
Predicted Condition + TP FP
Predicted Condition - FN TN

where

  • FN: False Negative, and

  • FP: False Positive,

  • TN: True Negative,

  • TP: True Positive.

The statistics returned in the stats element are:

  • accuracy = (TP + TN) / (TP + TN + FP + FN)

  • sensitivity = TP / (TP + FN)

  • specificity = TN / (TN + FP)

  • positive predictive value (PPV) = TP / (TP + FP)

  • negative predictive value (NPV) = TN / (TN + FN)

  • false negative rate (FNR) = 1 - Sensitivity

  • false positive rate (FPR) = 1 - Specificity

  • false discovery rate (FDR) = 1 - PPV

  • false omission rate (FOR) = 1 - NPV

  • F1 score

  • Matthews Correlation Coefficient (MCC) = ((TP * TN) - (FP * FN)) / sqrt((TP + FP) (TP+FN) (TN+FP) (TN+FN))

Synonyms for the statistics:

  • Sensitivity: true positive rate (TPR), recall, hit rate

  • Specificity: true negative rate (TNR), selectivity

  • PPV: precision

  • FNR: miss rate

Sensitivity and PPV could, in some cases, be indeterminate due to division by zero. To address this we will use the following rule based on the DICE group https://github.com/dice-group/gerbil/wiki/Precision,-Recall-and-F1-measure: If TP, FP, and FN are all 0, then PPV, sensitivity, and F1 will be defined to be 1. If TP are 0 and FP + FN > 0, then PPV, sensitivity, and F1 are all defined to be 0.

Examples

Run this code
# NOT RUN {
################################################################################
## Example 1
test  <- c(rep(1, 53), rep(0, 47))
truth <- c(rep(1, 20), rep(0, 33), rep(1, 10), rep(0, 37))
con_mat <- confusion_matrix(x = test, y = truth, positive = "1")
str(con_mat)

con_mat

con_mat$cells$true_positives  # 20
con_mat$cells$true_negatives  # 37
con_mat$cells$false_positives # 33
con_mat$cells$false_negatives # 10

con_mat_with_boot <- confusion_matrix(test, truth, positive = "1", boot = TRUE)
con_mat_with_boot

# only one value in one of the vectors
a <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0)  # all zeros
b <- c(1,0,1,0,1,0,0,0,0,0,0,0,0,1)  # some zeros and ones

confusion_matrix(a, b)
confusion_matrix(b, a)
confusion_matrix(a, b, positive = 1)
confusion_matrix(b, a, positive = 1)


################################################################################
## Example 2: based on an example from the wikipedia page:
# https://en.wikipedia.org/wiki/Confusion_matrix

animals <-
  data.frame(Predicted = c(rep("Cat",    5 + 2 +  0),
                           rep("Dog",    3 + 3 +  2),
                           rep("Rabbit", 0 + 1 + 11)),
             Actual    = c(rep(c("Cat", "Dog", "Rabbit"), times = c(5, 2,  0)),
                           rep(c("Cat", "Dog", "Rabbit"), times = c(3, 3,  2)),
                           rep(c("Cat", "Dog", "Rabbit"), times = c(0, 1, 11))),
             stringsAsFactors = FALSE)

table(animals)

cats <- apply(animals, 1:2, function(x) ifelse(x == "Cat", "Cat", "Non-Cat"))

# Default calls, note the difference based on what is set as the 'positive'
# value.
confusion_matrix(cats[, "Predicted"], cats[, "Actual"], positive = "Cat")
confusion_matrix(cats[, "Predicted"], cats[, "Actual"], positive = "Non-Cat")

# Using a Formula
confusion_matrix(formula = I(Actual == "Cat") ~ I(Predicted == "Cat"),
                 data = animals,
                 positive = "TRUE")

confusion_matrix(formula = I(Actual == "Cat") ~ I(Predicted == "Cat"),
                 data = animals,
                 positive = "TRUE",
                 boot = TRUE)

################################################################################
## Example 3
russell <-
  data.frame(Pred  = c(rep(0, 2295), rep(0, 118), rep(1, 1529), rep(1, 229)),
             Truth = c(rep(0, 2295), rep(1, 118), rep(0, 1529), rep(1, 229)))

# The values for Sensitivity, Specificity, PPV, and NPV are dependent on the
# "positive" level.  By default, the first level of y is used.
confusion_matrix(x = russell$Pred, y = russell$Truth, positive = "0")
confusion_matrix(x = russell$Pred, y = russell$Truth, positive = "1")

confusion_matrix(Truth ~ Pred, data = russell, positive = "0")
confusion_matrix(Truth ~ Pred, data = russell, positive = "1")

# }

Run the code above in your browser using DataLab