RPSS: Compute the Ranked Probability Skill Score

Description

The Ranked Probability Skill Score (RPSS; Wilks, 2011) is the skill score based on the Ranked Probability Score (RPS; Wilks, 2011). It can be used to assess whether a forecast presents an improvement or worsening with respect to a reference forecast. The RPSS ranges between minus infinite and 1. If the RPSS is positive, it indicates that the forecast has higher skill than the reference forecast, while a negative value means that it has a lower skill. Examples of reference forecasts are the climatological forecast (same probabilities for all categories for all time steps), persistence, a previous model version, and another model. It is computed as RPSS = 1 - RPS_exp / RPS_ref. The statistical significance is obtained based on a Random Walk test at the 95 2016). If there is more than one dataset, RPS will be computed for each pair of exp and obs data.

Usage

RPSS(
  exp,
  obs,
  ref = NULL,
  time_dim = "sdate",
  memb_dim = "member",
  dat_dim = NULL,
  prob_thresholds = c(1/3, 2/3),
  indices_for_clim = NULL,
  Fair = FALSE,
  weights = NULL,
  weights_exp = NULL,
  weights_ref = NULL,
  cross.val = FALSE,
  ncores = NULL
)

Value

$rpss: A numerical array of RPSS with dimensions c(nexp, nobs, the rest dimensions of 'exp' except 'time_dim' and 'memb_dim' dimensions). nexp is the number of experiment (i.e., dat_dim in exp), and nobs is the number of observation i.e., dat_dim in obs). If dat_dim is NULL, nexp and nobs are omitted.
$sign: A logical array of the statistical significance of the RPSS with the same dimensions as $rpss.

Arguments

exp: A named numerical array of the forecast with at least time and member dimension.
obs: A named numerical array of the observation with at least time dimension. The dimensions must be the same as 'exp' except 'memb_dim' and 'dat_dim'.
ref: A named numerical array of the reference forecast data with at least time and member dimension. The dimensions must be the same as 'exp' except 'memb_dim' and 'dat_dim'. If there is only one reference dataset, it should not have dataset dimension. If there is corresponding reference for each experiement, the dataset dimension must have the same length as in 'exp'. If 'ref' is NULL, the climatological forecast is used as reference forecast. The default value is NULL.
time_dim: A character string indicating the name of the time dimension. The default value is 'sdate'.
memb_dim: A character string indicating the name of the member dimension to compute the probabilities of the forecast and the reference forecast. The default value is 'member'.
dat_dim: A character string indicating the name of dataset dimension. The length of this dimension can be different between 'exp' and 'obs'. The default value is NULL.
prob_thresholds: A numeric vector of the relative thresholds (from 0 to 1) between the categories. The default value is c(1/3, 2/3), which corresponds to tercile equiprobable categories.
indices_for_clim: A vector of the indices to be taken along 'time_dim' for computing the thresholds between the probabilistic categories. If NULL, the whole period is used. The default value is NULL.
Fair: A logical indicating whether to compute the FairRPSS (the potential RPSS that the forecast would have with an infinite ensemble size). The default value is FALSE.
weights: Deprecated and will be removed in the next release. Please use 'weights_exp' and 'weights_ref' instead.
weights_exp: A named numerical array of the forecast ensemble weights. The dimension should include 'memb_dim', 'time_dim' and 'dat_dim' if there are multiple datasets. All dimension lengths must be equal to 'exp' dimension lengths. The default value is NULL, which means no weighting is applied. The ensemble should have at least 70 members or span at least 10 time steps and have more than 45 members if consistency between the weighted and unweighted methodologies is desired.
weights_ref: Same as 'weights_exp' but for the reference forecast.
cross.val: A logical indicating whether to compute the thresholds between probabilistics categories in cross-validation. The default value is FALSE.
ncores: An integer indicating the number of cores to use for parallel computation. The default value is NULL.

References

Wilks, 2011; https://doi.org/10.1016/B978-0-12-385022-5.00008-7 DelSole and Tippett, 2016; https://doi.org/10.1175/MWR-D-15-0218.1

Examples

Run this code

set.seed(1)
exp <- array(rnorm(3000), dim = c(lat = 3, lon = 2, member = 10, sdate = 50))
set.seed(2)
obs <- array(rnorm(300), dim = c(lat = 3, lon = 2, sdate = 50))
set.seed(3)
ref <- array(rnorm(3000), dim = c(lat = 3, lon = 2, member = 10, sdate = 50))
weights <- sapply(1:dim(exp)['sdate'], function(i) {
            n <- abs(rnorm(10))
            n/sum(n)
          })
dim(weights) <- c(member = 10, sdate = 50)
res <- RPSS(exp = exp, obs = obs) ## climatology as reference forecast
res <- RPSS(exp = exp, obs = obs, ref = ref) ## ref as reference forecast
res <- RPSS(exp = exp, obs = obs, ref = ref, weights_exp = weights, weights_ref = weights)

Run the code above in your browser using DataLab