pairwise_comparison: Do Pairwise Comparisons of Scores

Description

Make pairwise comparisons between models. The code for the pairwise comparisons is inspired by an implementation by Johannes Bracher.

The implementation of the permutation test follows the function permutationTest from the `surveillance` package by Michael H<U+00F6>hle, Andrea Riebler and Michaela Paul.

Usage

pairwise_comparison(
  scores,
  metric = "interval_score",
  test_options = list(oneSided = FALSE, test_type = c("non_parametric", "permuation"),
    n_permutations = 999),
  baseline = NULL,
  by = NULL,
  summarise_by = c("model")
)

Arguments

scores

A data.frame of unsummarised scores as produced by eval_forecasts

metric

A character vector of length one with the metric to do the comparison on.

test_options

list with options to pass down to compare_two_models. To change only one of the default options, just pass a list as input with the name of the argument you want to change. All elements not included in the list will be set to the default (so passing an empty list would result in the default options).

baseline

character vector of length one that denotes the baseline model against which to compare other models.

character vector of columns to group scoring by. This should be the lowest level of grouping possible, i.e. the unit of the individual observation. This is important as many functions work on individual observations. If you want a different level of aggregation, you should use summarise_by to aggregate the individual scores. Also not that the pit will be computed using summarise_by instead of by

summarise_by

character vector of columns to group the summary by. By default, this is equal to `by` and no summary takes place. But sometimes you may want to to summarise over categories different from the scoring. summarise_by is also the grouping level used to compute (and possibly plot) the probability integral transform(pit).

Value

A ggplot2 object with a coloured table of summarised scores

Examples

Run this code

# NOT RUN {
df <- data.frame(model = rep(c("model1", "model2", "model3"), each = 10),
                 date = as.Date("2020-01-01") + rep(1:5, each = 2),
                 location = c(1, 2),
                 interval_score = (abs(rnorm(30))),
                 aem = (abs(rnorm(30))))

res <- scoringutils::pairwise_comparison(df,
                           baseline = "model1")
scoringutils::plot_pairwise_comparison(res)

eval <- scoringutils::eval_forecasts(scoringutils::range_example_data_long)
pairwise <- pairwise_comparison(eval, summarise_by = c("model"))
# }

Run the code above in your browser using DataLab