cv.evmOpt: Cross-validation for the shape parameter in an extreme values model

Description

Cross-validation for the shape parameter in an extreme values model

Usage

# S3 method for evmOpt
cv(
  object,
  folds = 10,
  ...,
  penalty = "gaussian",
  range = seq(1, 25, length.out = 25),
  shape = NULL
)

Arguments

object: An object of class 'evmOpt' as returned by evm.
folds: Integer giving the number of cross-validation folds to use. Defaults to folds = 10.
...: Not used.
penalty: String specifying the type of penalty to use. Defaults to penalty = "gaussian" which is equivalent to using a quadratic penalty. The other allowed value is penalty = "lasso" and an L1 penalty is used.
range: A sequence of values for the penalty parameter. Defaults to range = seq(1, 25, length.out = 25). The values are taken to be the reciprocals of the prior variance so must be strictly positive.
shape: String giving the name of the shape parameter. Defaults to shape = NULL and the function tries to guess.

Details

Only the shape parameter is assumed to be penalized. The penalty can be thought of in terms of the variance of a prior distribution, which is equivalent to a quadratic penalty. Because the shape parameter will usually be between -1/2 and 1/2, a prior N(0, 1/16) distribution will likely be a good starting point, so values that span 16 will usually be appropriate.

Note that the procedure appears to frequently prefer larger penalties over smaller ones, effectively driving the shape parameter to zero. However, if you are fitting distributions that can model long tails, there is probably a good reason for that and you should use your prior knowledge to determine if non-zero values of the shape are plausible, rather than rely solely on an automated procedure.

Also note that small numbers of observations can have a big impact on parameter estimates. Because cross-validation involves randomly assigning values to folds, the results are generally different from one run to the next. These to features combined can produce quite big differences between cross-validation runs and it is advisable to use either leave-one-out (by setting folds to be the same as the length of the data), or to run the procedure several times and average over them.

@note At present, only models without covariates are implemented.