calibrate: Calibration of probabilities according to the given prior.

Description

Given probability scores predictedProb as provided for example by a call to predict.CoreModel and using one of available methods given by methods the function calibrates predicted probabilities so that they match the actual probabilities of a binary class 1 provided by correctClass. The computed calibration can be applied to the scores returned by that model.

Usage

calibrate(correctClass, predictedProb, class1=1, 
          method = c("isoReg","binIsoReg","binning","mdlMerge"), 
          weight=NULL, noBins=10, assumeProbabilities=FALSE)
          
applyCalibration(predictedProb, calibration)

Value

A function returns a list with two vector components of the same length:

interval: The boundaries of the intervals. Lower boundary 0 is not explicitly included but should be taken into account.
calProb: The calibrated probabilities for each corresponding interval.

Arguments

correctClass: A vector of correct class labels for a binary classification problem.
predictedProb: A vector of predicted class 1 (probability) scores. In calibrate method it should be of the same length as correctClass.
class1: A class value (factor) or an index of the class value to be taken as a class to be calibrated.
method: One of isoReg, binIsoReg, binning, or mdlMerge. See details below.
weight: If specified, should be of the same length as correctClass and gives the weights for all the instances, otherwise a default weight of 1 for each instance is assumed.
noBins: The value of parameter depends on the parameter method and specifies desired or initial number of bins. See details below.
assumeProbabilities: If assumeProbabilities=TRUE the values in predictedProb are expected to be in [0,1] range i.e., probability estimates. assumeProbabilities=FALSE the algorithm can be used as ordinary (isotonic) regression
calibration: The list resulting from a call to calibration and subsequently applied to probability scores returned by the same model.

Author

Marko Robnik-Sikonja

Details

Depending on the specified method one of the following calibration methods is executed.

"isoReg" isotonic regression calibration based on pair-adjacent violators (PAV) algorithm.
"binning" calibration into a pre-specified number of bands given by noBins parameter, trying to make bins of equal weight.
"binIsoReg" first binning method is executed, following by a isotonic regression calibration.
"mdlMerge" first intervals are merged by a MDL gain criterion into a prespecified number of intervals, following by the isotonic regression calibration.

If model="binning" the parameter noBins specifies the desired number of bins i.e., calibration bands; if model="binIsoReg" the parameter noBins specifies the number of initial bins that are formed by binning before isotonic regression is applied; if model="mdlMerge" the parameter noBins specifies the number of bins formed after first applying isotonic regression. The most similar bins are merged using MDL criterion.

References

I. Kononenko, M. Kukar: Machine Learning and Data Mining: Introduction to Principles and Algorithms. Horwood, 2007

A. Niculescu-Mizil, R. Caruana: Predicting Good Probabilities With Supervised Learning. Proceedings of the 22nd International Conference on Machine Learning (ICML'05), 2005

Examples

Run this code

# generate data set separately for training the model, 
#   calibration of probabilities and testing
train <-classDataGen(noInst=200)
cal <-classDataGen(noInst=200)
test <- classDataGen(noInst=200)

# build random forests model with default parameters
modelRF <- CoreModel(class~., train, model="rf", maxThreads=1)

# prediction 
predCal <- predict(modelRF, cal, rfPredictClass=FALSE)
predTest <- predict(modelRF, test, rfPredictClass=FALSE)
destroyModels(modelRF) # clean up, model not needed anymore

# calibrate for a chosen class1 and method
class1<-1
calibration <- calibrate(cal$class, predCal$prob[,class1], class1=class1, 
                         method="isoReg",assumeProbabilities=TRUE)

# apply the calibration to the testing set
calibratedProbs <- applyCalibration(predTest$prob[,class1], calibration)
# the calibration of probabilities can be visualized with 
# reliabilityPlot function

Run the code above in your browser using DataLab