Calculate confusion matrix, gain and RGain measure.
tdmModConfmat(d, colreal, colpred, opts, predProb = NULL)
data frame
name of column in d which contains the real class
name of column in d which contains the predicted class
a list from which we use the elements:
gainmat
: the gain matrix for each possible outcome, same size as cm$mat
(see below).
gainmat[R1,P2]
is the gain associated with a record of real class R1 which we
predict as class P2. (gain matrix = - cost matrix)
rgain.type
: one out of {"rgain" | "meanCA" | "minCA" | "bYouden"
| "arROC" | "arLIFT" | "arPRE" },
affects output cm$mat
and cm$rgain
, see below.
if not NULL, a data frame with as many rows as data frame d
, containing
columns (index, true label, predicted label, prediction score). Is only needed
for opts$rgain.type=="ar*"
.
cm
, a list containing:
matrix with real class levels as rows, predicted class levels columns.
mat[R1,P2]
is the number of records with real class R1
predicted as class P2, if opts$rgain.type=="rgain".
If opts$rgain.type=="meanCA" or "minCA", then show this number as percentage
of "records with real class R1" (percentage of each row).
CAUTION: If there are NA's in column colpred
, those cases are missing
in mat
(!) (but the class errors are correct as long as there are
no NA's in column colreal
)
class error rates, vector of size nlevels(colreal)+1.
cerr[X]
is the misclassification rate for real class X.
cerr["Total"]
is the total classification error rate.
the total gain (sum of pointwise product opts$gainmat*cm$mat
)
gain.vector[X] is the gain attributed to real class label X. gain.vector["Total"] is again the total gain.
the maximum achievable gain, assuming perfect prediction
Depending on the value of opts$rgain.type
:
"rgain"
: ratio gain/gainmax in percent,
"meanCA"
: mean class accuracy percentage (i.e. mean(diag(cm$mat)),
"minCA"
: min class accuracy percentage (i.e. min(diag(cm$mat)),
"bYouden"
: balanced Youden index: min(sensitivity,specificity),
"arROC"
: area under ROC curve (a number in [0,1]),
"arLIFT"
: area between lift curve and horizontal line 1.0,
"arPRE"
: area under precision-recall curve (a number in [0,1])