Learn R Programming

surveillance (version 1.23.1)

hhh4_simulate_scores: Proper Scoring Rules for Simulations from hhh4 Models

Description

Calculate proper scoring rules based on simulated predictive distributions.

Usage

# S3 method for hhh4sims
scores(x, which = "rps", units = NULL, ..., drop = TRUE)
# S3 method for hhh4simslist
scores(x, ...)

Arguments

x

an object of class "hhh4sims" (as resulting from the simulate-method for "hhh4" models if simplify = TRUE was set), or an "hhh4simslist", i.e., a list of such simulations potentially obtained from different model fits (using the same simulation period).

which

a character vector indicating which proper scoring rules to compute. By default, only the ranked probability score ("rps") is calculated. Other options include "logs" and "dss".

units

if non-NULL, an integer or character vector indexing the columns of x for which to compute the scores.

drop

a logical indicating if univariate dimensions should be dropped (the default).

...

unused (argument of the generic).

Author

Sebastian Meyer

Details

This implementation can only compute univariate scores, i.e., independently for each time point.

The logarithmic score is badly estimated if the domain is large and there are not enough samples to cover the underlying distribution in enough detail (the score becomes infinite when an observed value does not occur in the samples). An alternative is to use kernel density estimation as implemented in the R package scoringRules.

Examples

Run this code
data("salmAllOnset")

## fit a hhh4 model to the first 13 years
salmModel <- list(end = list(f = addSeason2formula(~1 + t)),
                  ar = list(f = ~1), family = "NegBin1", subset = 2:678)
salmFit <- hhh4(salmAllOnset, salmModel)

## simulate the next 20 weeks ahead (with very small 'nsim' for speed)
salmSims <- simulate(salmFit, nsim = 500, seed = 3, subset = 678 + seq_len(20),
                     y.start = observed(salmAllOnset)[678,])
if (requireNamespace("fanplot"))
    plot(salmSims, "fan")


### calculate scores at each time point

## using empirical distribution of simulated counts as forecast distribution
scores(salmSims, which = c("rps", "logs", "dss"))
## observed count sometimes not covered by simulations -> infinite log-score
## => for a more detailed forecast, either considerably increase 'nsim', or:

## 1. use continuous density() of simulated counts as forecast distribution
fi <- apply(salmSims, 1, function (x) approxfun(density(x)))
logs_kde <- mapply(function (f, y) -log(f(y)),
                   f = fi, y = observed(attr(salmSims,"stsObserved")))
cbind("empirical" = scores(salmSims, "logs"), "density" = logs_kde)
## a similar KDE approach is implemented in scoringRules::logs_sample()

## 2. average conditional predictive NegBin's of simulated trajectories,
##    currently only implemented in HIDDA.forecasting::dhhh4sims()


### produce a PIT histogram

## using empirical distribution of simulated counts as forecast distribition
pit(x = observed(attr(salmSims, "stsObserved")),
    pdistr = apply(salmSims, 1:2, ecdf))
## long-term forecast is badly calibrated (lower tail is unused, see fan above)
## we also get a warning for the same reason as infinite log-scores

Run the code above in your browser using DataLab