list containing:
a tibble with summarized results (called summarized_metrics)
a tibble with random evaluations (random_evaluations)
a tibble with the summarized class level results
(summarized_class_level_results)
(Multinomial only)
----------------------------------------------------------------
Gaussian Results
----------------------------------------------------------------
The Summarized Results tibble contains:
Average RMSE, MAE, NRMSE(IQR),
RRSE, RAE, RMSLE.
See the additional metrics (disabled by default) at ?gaussian_metrics.
The Measure column indicates the statistical descriptor used on the evaluations.
The row where Measure == All_rows is the evaluation when the baseline model
is trained on all rows in `train_data`.
The Training Rows column contains the aggregated number of rows used from `train_data`,
when fitting the baseline models.
....................................................................
The Random Evaluations tibble contains:
The non-aggregated metrics.
A nested tibble with the predictions and targets.
A nested tibble with the coefficients of the baseline models.
Number of training rows used when fitting the baseline model on the training set.
A nested Process information object with information
about the evaluation.
Name of dependent variable.
Name of fixed effect (bias term only).
Random effects structure (if specified).
----------------------------------------------------------------
Binomial Results
----------------------------------------------------------------
Based on the generated test set predictions,
a confusion matrix and ROC curve are used to get the following:
ROC:
AUC, Lower CI, and Upper CI
Note, that the ROC curve is only computed when AUC is enabled.
Confusion Matrix:
Balanced Accuracy,
Accuracy,
F1,
Sensitivity,
Specificity,
Positive Predictive Value,
Negative Predictive Value,
Kappa,
Detection Rate,
Detection Prevalence,
Prevalence, and
MCC (Matthews correlation coefficient).
....................................................................
The Summarized Results tibble contains:
The Measure column indicates the statistical descriptor used on the evaluations.
The row where Measure == All_0 is the evaluation when all predictions are 0.
The row where Measure == All_1 is the evaluation when all predictions are 1.
The aggregated metrics.
....................................................................
The Random Evaluations tibble contains:
The non-aggregated metrics.
A nested tibble with the predictions and targets.
A list of ROC curve objects (if computed).
A nested tibble with the confusion matrix.
The Pos_ columns tells you whether a row is a
True Positive (TP), True Negative (TN), False Positive (FP),
or False Negative (FN), depending on which level is the "positive" class.
I.e. the level you wish to predict.
A nested Process information object with information
about the evaluation.
Name of dependent variable.
----------------------------------------------------------------
Multinomial Results
----------------------------------------------------------------
Based on the generated test set predictions,
one-vs-all (binomial) evaluations are performed and aggregated
to get the same metrics as in the binomial results
(excluding MCC, AUC, Lower CI and Upper CI),
with the addition of Overall Accuracy and multiclass
MCC in the summarized results.
It is possible to enable multiclass AUC as well, which has been
disabled by default as it is slow to calculate when there's a large set of classes.
Since we use macro-averaging, Balanced Accuracy is the macro-averaged
metric, not the macro sensitivity as sometimes used.
Note: we also refer to the one-vs-all evaluations as the class level results.
....................................................................
The Summarized Results tibble contains:
Summary of the random evaluations.
How: First, the one-vs-all binomial evaluations are aggregated by repetition,
then, these aggregations are summarized. Besides the
metrics from the binomial evaluations (see Binomial Results above), it
also includes Overall Accuracy and multiclass MCC.
The Measure column indicates the statistical descriptor used on the evaluations.
The Mean, Median, SD, IQR, Max, Min,
NAs, and INFs measures describe the Random Evaluations tibble,
while the CL_Max, CL_Min, CL_NAs, and
CL_INFs describe the Class Level results.
The rows where Measure == All_<<class name>> are the evaluations when all
the observations are predicted to be in that class.
....................................................................
The Summarized Class Level Results tibble contains:
The (nested) summarized results for each class, with the same metrics and descriptors as
the Summarized Results tibble. Use tidyr::unnest
on the tibble to inspect the results.
How: The one-vs-all evaluations are summarized by class.
The rows where Measure == All_0 are the evaluations when none of the observations
are predicted to be in that class, while the rows where Measure == All_1 are the
evaluations when all of the observations are predicted to be in that class.
....................................................................
The Random Evaluations tibble contains:
The repetition results with the same metrics as the Summarized Results tibble.
How: The one-vs-all evaluations are aggregated by repetition.
If a metric contains one or more NAs in the one-vs-all evaluations, it
will lead to an NA result for that repetition.
Also includes:
A nested tibble with the one-vs-all binomial evaluations (Class Level Results),
including nested Confusion Matrices and the
Support column, which is a count of how many observations from the
class is in the test set.
A nested tibble with the predictions and targets.
A list of ROC curve objects.
A nested tibble with the multiclass confusion matrix.
A nested Process information object with information
about the evaluation.
Name of dependent variable.