gpd: Generalized Pareto Distribution Regression Family Function

Description

Maximum likelihood estimation of the 2-parameter generalized Pareto distribution (GPD).

Usage

gpd(threshold = 0, lscale = "loglink", lshape = logofflink(offset = 0.5),
    percentiles = c(90, 95), iscale = NULL, ishape = NULL,
    tolshape0 = 0.001, type.fitted = c("percentiles", "mean"),
    imethod = 1, zero = "shape")

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm

and vgam. However, for this VGAM family function, vglm

is probably preferred over vgam when there is smoothing.

Arguments

threshold: Numeric, values are recycled if necessary. The threshold value(s), called $\mu$ below.
lscale: Parameter link function for the scale parameter $\sigma$. See Links for more choices.
lshape: Parameter link function for the shape parameter $\xi$. See Links for more choices. The default constrains the parameter to be greater than $-0.5$ because if $\xi \leq -0.5$ then Fisher scoring does not work. See the Details section below for more information.

percentiles: Numeric vector of percentiles used for the fitted values. Values should be between 0 and 100. See the example below for illustration. This argument is ignored if type.fitted = "mean".
type.fitted: See CommonVGAMffArguments for information. The default is to use the percentiles argument. If "mean" is chosen, then the mean $\mu + \sigma / (1-\xi)$ is returned as the fitted values, and these are only defined for $\xi<1$.
iscale, ishape: Numeric. Optional initial values for $\sigma$ and $\xi$. The default is to use imethod and compute a value internally for each parameter. Values of ishape should be between $-0.5$ and $1$. Values of iscale should be positive.

tolshape0: Passed into dgpd when computing the log-likelihood.

imethod: Method of initialization, either 1 or 2. The first is the method of moments, and the second is a variant of this. If neither work, try assigning values to arguments ishape and/or iscale.
zero: Can be an integer-valued vector specifying which linear/additive predictors are modelled as intercepts only. For one response, the value should be from the set {1,2} corresponding respectively to $\sigma$ and $\xi$. It is often a good idea for the $\sigma$ parameter only to be modelled through a linear combination of the explanatory variables because the shape parameter is probably best left as an intercept only: zero = 2. Setting zero = NULL means both parameters are modelled with explanatory variables. See CommonVGAMffArguments for more details.

Warning

Fitting the GPD by maximum likelihood estimation can be numerically fraught. If $1 + \xi (y-\mu)/ \sigma \leq 0$ then some crude evasive action is taken but the estimation process can still fail. This is particularly the case if vgam with s is used. Then smoothing is best done with vglm with regression splines (bs or ns) because vglm implements half-stepsizing whereas vgam doesn't. Half-stepsizing helps handle the problem of straying outside the parameter space.

Author

T. W. Yee

Details

The distribution function of the GPD can be written $$G(y) = 1 - [1 + \xi (y-\mu) / \sigma ]_{+}^{- 1/ \xi} $$ where $\mu$ is the location parameter (known, with value threshold), $\sigma > 0$ is the scale parameter, $\xi$ is the shape parameter, and $h_+ = \max(h,0)$. The function $1-G$ is known as the survivor function. The limit $\xi \rightarrow 0$ gives the shifted exponential as a special case: $$G(y) = 1 - \exp[-(y-\mu)/ \sigma]. $$ The support is $y>\mu$ for $\xi>0$, and $\mu < y <\mu-\sigma / \xi$ for $\xi<0$.

Smith (1985) showed that if $\xi <= -0.5$ then this is known as the nonregular case and problems/difficulties can arise both theoretically and numerically. For the (regular) case $\xi > -0.5$ the classical asymptotic theory of maximum likelihood estimators is applicable; this is the default.

Although for $\xi < -0.5$ the usual asymptotic properties do not apply, the maximum likelihood estimator generally exists and is superefficient for $-1 < \xi < -0.5$, so it is ``better'' than normal. When $\xi < -1$ the maximum likelihood estimator generally does not exist as it effectively becomes a two parameter problem.

The mean of $Y$ does not exist unless $\xi < 1$, and the variance does not exist unless $\xi < 0.5$. So if you want to fit a model with finite variance use lshape = "extlogitlink".

References

Yee, T. W. and Stephenson, A. G. (2007). Vector generalized linear and additive extreme value models. Extremes, 10, 1--19.

Coles, S. (2001). An Introduction to Statistical Modeling of Extreme Values. London: Springer-Verlag.

Smith, R. L. (1985). Maximum likelihood estimation in a class of nonregular cases. Biometrika, 72, 67--90.

Examples

Run this code

# Simulated data from an exponential distribution (xi = 0)
Threshold <- 0.5
gdata <- data.frame(y1 = Threshold + rexp(n = 3000, rate = 2))
fit <- vglm(y1 ~ 1, gpd(threshold = Threshold), data = gdata, trace = TRUE)
head(fitted(fit))
summary(depvar(fit))  # The original uncentred data
coef(fit, matrix = TRUE)  # xi should be close to 0
Coef(fit)
summary(fit)

head(fit@extra$threshold)  # Note the threshold is stored here

# Check the 90 percentile
ii <- depvar(fit) < fitted(fit)[1, "90%"]
100 * table(ii) / sum(table(ii))  # Should be 90%

# Check the 95 percentile
ii <- depvar(fit) < fitted(fit)[1, "95%"]
100 * table(ii) / sum(table(ii))  # Should be 95%

if (FALSE)  plot(depvar(fit), col = "blue", las = 1,
               main = "Fitted 90% and 95% quantiles")
matlines(1:length(depvar(fit)), fitted(fit), lty = 2:3, lwd = 2) 


# Another example
gdata <- data.frame(x2 = runif(nn <- 2000))
Threshold <- 0; xi <- exp(-0.8) - 0.5
gdata <- transform(gdata, y2 = rgpd(nn, scale = exp(1 + 0.1*x2), shape = xi))
fit <- vglm(y2 ~ x2, gpd(Threshold), data = gdata, trace = TRUE)
coef(fit, matrix = TRUE)


if (FALSE)  # Nonparametric fits
# Not so recommended:
fit1 <- vgam(y2 ~ s(x2), gpd(Threshold), data = gdata, trace = TRUE)
par(mfrow = c(2, 1))
plot(fit1, se = TRUE, scol = "blue")
# More recommended:
fit2 <- vglm(y2 ~ sm.bs(x2), gpd(Threshold), data = gdata, trace = TRUE)
plot(as(fit2, "vgam"), se = TRUE, scol = "blue")

Run the code above in your browser using DataLab