Measure to compare true observed labels with predicted labels in multiclass classification tasks.
mcc(truth, response, positive = NULL, ...)
Performance value as numeric(1)
.
(factor()
)
True (observed) labels.
Must have the same levels and length as response
.
(factor()
)
Predicted response labels.
Must have the same levels and length as truth
.
(character(1)
) Name of the positive class in case of binary classification.
(any
)
Additional arguments. Currently ignored.
Type: "classif"
Range: \([-1, 1]\)
Minimize: FALSE
Required prediction: response
In the binary case, the Matthews Correlation Coefficient is defined as $$ \frac{\mathrm{TP} \cdot \mathrm{TN} - \mathrm{FP} \cdot \mathrm{FN}}{\sqrt{(\mathrm{TP} + \mathrm{FP}) (\mathrm{TP} + \mathrm{FN}) (\mathrm{TN} + \mathrm{FP}) (\mathrm{TN} + \mathrm{FN})}}, $$ where \(TP\), \(FP\), \(TN\), \(TP\) are the number of true positives, false positives, true negatives, and false negatives respectively.
In the multi-class case, the Matthews Correlation Coefficient is defined for a multi-class confusion matrix \(C\) with \(K\) classes: $$ \frac{c \cdot s - \sum_k^K p_k \cdot t_k}{\sqrt{(s^2 - \sum_k^K p_k^2) \cdot (s^2 - \sum_k^K t_k^2)}}, $$ where
\(s = \sum_i^K \sum_j^K C_{ij}\): total number of samples
\(c = \sum_k^K C_{kk}\): total number of correctly predicted samples
\(t_k = \sum_i^K C_{ik}\): number of predictions for each class \(k\)
\(p_k = \sum_j^K C_{kj}\): number of true occurrences for each class \(k\).
The above formula is undefined if any of the four sums in the denominator is 0 in the binary case and more generally if either \(s^2 - \sum_k^K p_k^2\) or \(s^2 - \sum_k^K t_k^2)\) is equal to 0. The denominator is then set to 1.
When there are more than two classes, the MCC will no longer range between -1 and +1. Instead, the minimum value will be between -1 and 0 depending on the true distribution. The maximum value is always +1.
https://en.wikipedia.org/wiki/Phi_coefficient
Matthews BW (1975). “Comparison of the predicted and observed secondary structure of T4 phage lysozyme.” Biochimica et Biophysica Acta (BBA) - Protein Structure, 405(2), 442--451. tools:::Rd_expr_doi("10.1016/0005-2795(75)90109-9").
Other Classification Measures:
acc()
,
bacc()
,
ce()
,
logloss()
,
mauc_aunu()
,
mbrier()
,
zero_one()
set.seed(1)
lvls = c("a", "b", "c")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
mcc(truth, response)
Run the code above in your browser using DataLab