Compute relative scores between different models making pairwise
comparisons. Pairwise comparisons are a sort of pairwise tournament where all
combinations of two models are compared against each other based on the
overlapping set of available forecasts common to both models.
Internally, a ratio of the mean scores of both models is computed.
The relative score of a model is then the geometric mean of all mean score
ratios which involve that model. When a baseline is provided, then that
baseline is excluded from the relative scores for individual models
(which therefore differ slightly from relative scores without a baseline)
and all relative scores are scaled by (i.e. divided by) the relative score of
the baseline model.
Usually, the function input should be unsummarised scores as
produced by score()
.
Note that the function internally infers the unit of a single forecast by
determining all columns in the input that do not correspond to metrics
computed by score()
. Adding unrelated columns will change results in an
unpredictable way.
The code for the pairwise comparisons is inspired by an implementation by
Johannes Bracher.
The implementation of the permutation test follows the function
permutationTest
from the surveillance
package by Michael Höhle,
Andrea Riebler and Michaela Paul.