G.W. Brier (1950). Verification of forecasts expressed in terms of probability.
Mon. Wea. Rev. 78, 1-3.
K.H. Brodersen, C.S. Ong, K.E. Stephan, J.M. Buhmann (2010). The balanced
accuracy and its posterior distribution. In Pattern Recognition (ICPR),
20th International Conference on, 3121-3124 (IEEE, 2010).
J.A. Cohen (1960). A coefficient of agreement for nominal scales.
Educational and Psychological Measurement 20, 3746.
T. Fawcett (2006). An introduction to ROC analysis.
Pattern Recognition Letters 27, 861-874.
T.A. Gerds, T. Cai, M. Schumacher (2008). The performance of risk prediction
models. Biom J 50, 457-479.
D. Hand, R. Till (2001). A simple generalisation of the area under the ROC
curve for multiple class classification problems.
Machine Learning 45, 171-186.
J. Hernandez-Orallo, P.A. Flach, C. Ferri (2011). Brier curves: a new cost-
based visualisation of classifier performance. In L. Getoor and T. Scheffer (eds.)
Proceedings of the 28th International Conference on Machine Learning (ICML-11),
585???592 (ACM, New York, NY, USA).
J. Hernandez-Orallo, P.A. Flach, C. Ferri (2012). A unified view of performance
metrics: Translating threshold choice into expected classification loss.
J. Mach. Learn. Res. 13, 2813-2869.
B.W. Matthews (1975). Comparison of the predicted and observed secondary
structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA) -
Protein Structure 405, 442-451.
D.M. Powers (2011). Evaluation: From Precision, Recall and F-Factor to ROC,
Informedness, Markedness and Correlation. Journal of Machine Learning
Technologies 1, 37-63.
N.A. Smits (2010). A note on Youden's J and its cost ratio.
BMC Medical Research Methodology 10, 89.
B. Wallace, I. Dahabreh (2012). Class probability estimates are unreliable for
imbalanced data (and how to fix them). In Data Mining (ICDM), IEEE 12th
International Conference on, 695-04.
J.W. Youden (1950). Index for rating diagnostic tests.
Cancer 3, 32-35.