Learn R Programming

scoringutils: Utilities for Scoring and Assessing Predictions

Note: This documentation refers to the stable version of scoringutils. You can also view the documentation of the development version.

The scoringutils package facilitates the process of evaluating forecasts in R, using a convenient and flexible data.table-based framework. It provides broad functionality to check the input data and diagnose issues, to visualise forecasts and missing data, to transform data before scoring, to handle missing forecasts, to aggregate scores, and to visualise the results of the evaluation. The package is easily extendable, meaning that users can supply their own scoring rules or extend existing classes to handle new types of forecasts.

The package underwent a major re-write. The most comprehensive documentation for the updated package is the revised version of our original scoringutils paper.

Another good starting point are the vignettes

Details on the metrics implemented and Scoring forecasts directly.

For further details on the specific issue of transforming forecasts for scoring see:

Nikos I. Bosse, Sam Abbott, Anne Cori, Edwin van Leeuwen, Johannes Bracher* and Sebastian Funk* (*: equal contribution) (2023). Scoring epidemiological forecasts on transformed scales, PLoS Comput Biol 19(8): e1011393 https://doi.org/10.1371/journal.pcbi.1011393

Installation

Install the CRAN version of this package using

install.packages("scoringutils")

Install the unstable development version from GitHub using

remotes::install_github("epiforecasts/scoringutils", dependencies = TRUE)

Quick start

Forecast types

scoringutils currently supports scoring the following forecast types:

  • binary: a probability for a binary (yes/no) outcome variable.
  • point: a forecast for a continuous or discrete outcome variable that is represented by a single number.
  • quantile: a probabilistic forecast for a continuous or discrete outcome variable, with the forecast distribution represented by a set of predictive quantiles.
  • sample: a probabilistic forecast for a continuous or discrete outcome variable, with the forecast represented by a finite set of samples drawn from the predictive distribution.
  • nominal categorical forecast with unordered outcome possibilities (generalisation of binary forecasts to multiple outcomes)

Input formats and input validation

The expected input format is generally a data.frame (or similar) with required columns observed, and predicted that holds the forecasts and observed values. Exact requirements depend on the forecast type. For more information, have a look at the paper, call ?as_forecast_binary, ?as_forecast_quantile etc., or have a look at the example data provided in the package (example_binary, example_point, example_quantile, example_sample_continuous, example_sample_discrete, example_nominal).

Before scoring, input data needs to be validated and transformed into a forecast object using one of the as_forecast_<type>() functions.

forecast_quantile <- example_quantile |>
  as_forecast_quantile(
    forecast_unit = c(
      "location", "forecast_date", "target_end_date", "target_type", "model", "horizon"
    )
  )
#> ℹ Some rows containing NA values may be removed. This is fine if not
#>   unexpected.

print(forecast_quantile, 2)
#> Forecast type: quantile
#> Forecast unit:
#> location, forecast_date, target_end_date, target_type, model, and horizon
#> 
#> Key: <location, target_end_date, target_type>
#>        observed quantile_level predicted location forecast_date target_end_date
#>           <num>          <num>     <int>   <char>        <Date>          <Date>
#>     1:   127300             NA        NA       DE          <NA>      2021-01-02
#>     2:     4534             NA        NA       DE          <NA>      2021-01-02
#>    ---                                                                         
#> 20544:       78          0.975       611       IT    2021-07-12      2021-07-24
#> 20545:       78          0.990       719       IT    2021-07-12      2021-07-24
#>        target_type                model horizon
#>             <char>               <char>   <num>
#>     1:       Cases                 <NA>      NA
#>     2:      Deaths                 <NA>      NA
#>    ---                                         
#> 20544:      Deaths epiforecasts-EpiNow2       2
#> 20545:      Deaths epiforecasts-EpiNow2       2

The forecast unit

For quantile-based and sample-based forecasts, a single prediction is represented by a set of several quantiles (or samples) from the predictive distribution, i.e. several rows in the input data. scoringutils therefore needs to group rows together that form a single forecast. scoringutils uses all other existing columns in the input data to achieve this - the values in all other columns should uniquely identify a single forecast. Additional columns unrelated to the forecast unit can mess this up. The forecast_unit argument in as_forecast_<type>() makes sure that only those columns are retained which are relevant for defining the unit of a single forecast.

Scoring forecasts

Forecasts can be scored by calling score() on a validated forecast object.

scores <- forecast_quantile |> 
  score()

score() takes an additional argument, metrics, with a list of scoring rules. Every forecast type has a default list of metrics. You can easily add your own scoring functions, as long as they conform with the format for that forecast type. See the paper for more information.

You can summarise scores using the function summarise_scores(). The by argument is used to specify the desired level of summary. fun let’s you specify any summary function, although it is recommended to stick to the mean as a primary summary function, as other functions can lead to improper scores.

scores |> 
  summarise_scores(by = c("model", "target_type")) |>
  summarise_scores(by = c("model", "target_type"), fun = signif, digits = 3)
#>                    model target_type     wis overprediction underprediction
#>                   <char>      <char>   <num>          <num>           <num>
#> 1: EuroCOVIDhub-ensemble       Cases 17900.0       10000.00          4240.0
#> 2: EuroCOVIDhub-baseline       Cases 28500.0       14100.00         10300.0
#> 3:  epiforecasts-EpiNow2       Cases 20800.0       11900.00          3260.0
#> 4: EuroCOVIDhub-ensemble      Deaths    41.4           7.14             4.1
#> 5: EuroCOVIDhub-baseline      Deaths   159.0          65.90             2.1
#> 6:       UMass-MechBayes      Deaths    52.7           8.98            16.8
#> 7:  epiforecasts-EpiNow2      Deaths    66.6          18.90            15.9
#>    dispersion     bias interval_coverage_50 interval_coverage_90 ae_median
#>         <num>    <num>                <num>                <num>     <num>
#> 1:     3660.0 -0.05640                0.391                0.805   24100.0
#> 2:     4100.0  0.09800                0.328                0.820   38500.0
#> 3:     5660.0 -0.07890                0.469                0.789   27900.0
#> 4:       30.2  0.07270                0.875                1.000      53.1
#> 5:       91.4  0.33900                0.664                1.000     233.0
#> 6:       26.9 -0.02230                0.461                0.875      78.5
#> 7:       31.9 -0.00513                0.420                0.908     105.0

Package workflow

The following depicts the suggested workflow for evaluating forecasts with scoringutils (sections refer to the paper). Please find more information in the paper, the function documentation and the vignettes.

Citation

If you are using scoringutils in your work please consider citing it using the output of citation("scoringutils") (or print(citation("scoringutils"), bibtex = TRUE)):

#> To cite scoringutils in publications use the following. If you use the
#> CRPS, DSS, or Log Score, please also cite scoringRules.
#> 
#>   Nikos I. Bosse, Hugo Gruson, Sebastian Funk, Anne Cori, Edwin van
#>   Leeuwen, and Sam Abbott (2022). Evaluating Forecasts with
#>   scoringutils in R, arXiv. DOI: 10.48550/ARXIV.2205.07090
#> 
#> To cite scoringRules in publications use:
#> 
#>   Alexander Jordan, Fabian Krueger, Sebastian Lerch (2019). Evaluating
#>   Probabilistic Forecasts with scoringRules. Journal of Statistical
#>   Software, 90(12), 1-37. DOI 10.18637/jss.v090.i12
#> 
#> To see these entries in BibTeX format, use 'print(<citation>,
#> bibtex=TRUE)', 'toBibtex(.)', or set
#> 'options(citation.bibtex.max=999)'.

How to make a bug report or feature request

Please briefly describe your problem and what output you expect in an issue. If you have a question, please don’t open an issue. Instead, ask on our Q and A page.

Contributing

We welcome contributions and new contributors! We particularly appreciate help on priority problems in the issues. Please check and add to the issues, and/or add a pull request.

Code of Conduct

Please note that the scoringutils project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Funding

The development of scoringutils was funded via the Health Protection Research Unit (grant code NIHR200908) and the Wellcome Trust (grant: 210758/Z/18/Z). This work has also been supported by the US National Institutes of General Medical Sciences (R35GM119582). The content is solely the responsibility of the authors and does not necessarily represent the official views of NIGMS, or the National Institutes of Health.

Contributors

All contributions to this project are gratefully acknowledged using the allcontributors package following the all-contributors specification. Contributions of any kind are welcome!

Code

nikosbosse, seabbs, sbfnk, jamesmbaazam, Bisaloo, actions-user, toshiakiasakura, MichaelChirico, nickreich, jhellewell14, damonbayer

Issue Authors

DavideMagno, mbojan, dshemetov, elray1, jonathonmellor, jcken95

Issue Contributors

jbracher, dylanhmorris, kathsherratt

Copy Link

Version

Install

install.packages('scoringutils')

Monthly Downloads

895

Version

2.1.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Nikos Bosse

Last Published

March 3rd, 2025

Functions in scoringutils (2.1.0)

as_scores

Create an object of class scores from data
assert_input_interval

Assert that inputs are correct for interval-based forecast
assert_forecast_type

Assert that forecast type is as expected
assert_input_categorical

Assert that inputs are correct for categorical forecasts
as_forecast_quantile

Create a forecast object for quantile-based forecasts
as_forecast_sample

Create a forecast object for sample-based forecasts
assert_forecast_generic

Validation common to all forecast types
assert_input_binary

Assert that inputs are correct for binary forecast
assert_forecast.forecast_binary

Assert that input is a forecast object and passes validations
assert_dims_ok_point

Assert Inputs Have Matching Dimensions
bias_quantile

Determines bias of quantile forecasts
bias_quantile_single_vector

Compute bias for a single vector of quantile predictions
bias_sample

Determine bias of forecasts
check_columns_present

Check column names are present in a data.frame
assert_input_sample

Assert that inputs are correct for sample-based forecast
assert_scores

Validate an object of class scores
check_input_binary

Check that inputs are correct for binary forecast
check_input_point

Check that inputs are correct for point forecast
check_input_quantile

Check that inputs are correct for quantile-based forecast
check_input_sample

Check that inputs are correct for sample-based forecast
example_binary

Binary forecast example data
example_nominal

Nominal example data
check_input_interval

Check that inputs are correct for interval-based forecast
check_numeric_vector

Check whether an input is an atomic vector of mode 'numeric'
assert_input_ordinal

Assert that inputs are correct for ordinal forecasts
dss_sample

Dawid-Sebastiani score
assert_input_nominal

Assert that inputs are correct for nominal forecasts
assert_input_point

Assert that inputs are correct for point forecast
assert_input_quantile

Assert that inputs are correct for quantile-based forecast
check_try

Helper function to convert assert statements into checks
clean_forecast

Clean forecast object
crps_sample

(Continuous) ranked probability score
document_assert_functions

Documentation template for assert functions
check_dims_ok_point

Check Inputs Have Matching Dimensions
example_ordinal

Ordinal example data
example_quantile

Quantile example data
ensure_data.table

Ensure that an object is a data.table
get_duplicate_forecasts

Find duplicate forecasts
check_number_per_forecast

Check that all forecasts have the same number of rows
get_coverage

Get quantile and interval coverage values for quantile-based forecasts
check_duplicates

Check that there are no duplicate forecasts
example_point

Point forecast example data
get_metrics

Get metrics
compare_forecasts

Compare a subset of common forecasts
get_forecast_unit

Get unit of a single forecast
get_metrics.forecast_binary

Get default metrics for binary forecasts
get_pit_histogram.forecast_quantile

Probability integral transformation histogram
get_metrics.forecast_nominal

Get default metrics for nominal forecasts
get_forecast_counts

Count number of available forecasts
example_sample_continuous

Continuous forecast example data
document_check_functions

Documentation template for check functions
get_forecast_type

Get forecast type from forecast object
document_test_functions

Documentation template for test functions
illustration-input-metric-binary-point

Illustration of required inputs for binary and point forecasts
illustration-input-metric-nominal

Illustration of required inputs for nominal forecasts
get_metrics.scores

Get names of the metrics that were used for scoring
example_sample_discrete

Discrete forecast example data
illustration-input-metric-sample

Illustration of required inputs for sample-based forecasts
geometric_mean

Calculate geometric mean
forecast_types

Documentation template for forecast types
interpolate_median

Helper function to interpolate the median prediction if it is not available
get_correlations

Calculate correlation between metrics
plot_heatmap

Create a heatmap of a scoring metric
plot_forecast_counts

Visualise the number of available forecasts
get_metrics.forecast_sample

Get default metrics for sample-based forecasts
log_shift

Log transformation with an additive shift
get_metrics.forecast_quantile

Get default metrics for quantile-based forecasts
is_forecast_binary

Test whether an object is a forecast object
get_metrics.forecast_ordinal

Get default metrics for nominal forecasts
print.forecast

Print information about a forecast object
illustration-input-metric-ordinal

Illustration of required inputs for ordinal forecasts
illustration-input-metric-quantile

Illustration of required inputs for quantile-based forecasts
get_pairwise_comparisons

Obtain pairwise comparisons between models
logs_sample

Logarithmic score (sample-based version)
score.forecast_binary

Evaluate forecasts
quantile_score

Quantile score
mad_sample

Determine dispersion of a probabilistic forecast
theme_scoringutils

Scoringutils ggplot2 theme
new_forecast

Class constructor for forecast objects
scoring-functions-binary

Metrics for binary outcomes
transform_forecasts

Transform forecasts and observed values
get_metrics.forecast_point

Get default metrics for point forecasts
se_mean_sample

Squared error of the mean (sample-based version)
plot_wis

Plot contributions to the weighted interval score
plot_quantile_coverage

Plot quantile coverage
get_range_from_quantile

Get interval range belonging to a quantile
summarise_scores

Summarise scores as produced by score()
sample_to_interval_long

Change data from a sample-based format to a long interval range format
run_safely

Run a function safely
set_forecast_unit

Set unit of a single forecast manually
interval_score

Interval score
select_metrics

Select metrics from a list of functions
interval_coverage

Interval coverage (for quantile-based forecasts)
get_type

Get type of a vector or matrix of observed values or predictions
plot_interval_coverage

Plot interval coverage
plot_pairwise_comparisons

Plot heatmap of pairwise comparisons
new_scores

Construct an object of class scores
quantile_to_interval

Transform from a quantile format to an interval format
rps_ordinal

Ranked Probability Score for ordinal outcomes
test_columns_not_present

Test whether column names are NOT present in a data.frame
get_protected_columns

Get protected columns from data
test_columns_present

Test whether all column names are present in a data.frame
pit_histogram_sample

Probability integral transformation for counts
plot_correlations

Plot correlation between metrics
validate_metrics

Validate metrics
logs_categorical

Log score for categorical outcomes
scoringutils-package

scoringutils: Utilities for Scoring and Assessing Predictions
permutation_test

Simple permutation test
pairwise_comparison_one_group

Do pairwise comparison for one set of forecasts
wis

Weighted interval score (WIS)
as_forecast_ordinal

Create a forecast object for ordinal forecasts
as_forecast_generic

Common functionality for as_forecast_<type> functions
as_forecast_doc_template

General information on creating a forecast object
as_forecast_binary

Create a forecast object for binary forecasts
as_forecast_point

Create a forecast object for point forecasts
as_forecast_nominal

Create a forecast object for nominal forecasts
ae_median_sample

Absolute error of the median (sample-based version)
add_relative_skill

Add relative skill scores based on pairwise comparisons
apply_metrics

Apply a list of functions to a data table of forecasts
ae_median_quantile

Absolute error of the median (quantile-based version)