gpd: Generalized Pareto distribution modelling

Description

Likelihood based modelling and inference for the generalized Pareto distribution, possibly with explanatory variables.

Usage

gpd(y, data, th, qu, phi = ~1, xi = ~1, penalty = "gaussian",
    prior = "gaussian", method = "optimize", start = NULL, priorParameters = NULL,
    maxit = 10000, trace = NULL,
    iter = 10500, burn = 500, thin = 1, jump.const, verbose = TRUE)
## S3 method for class 'gpd':
print(x, digits=max(3, getOption("digits") - 3), ...)
## S3 method for class 'gpd':
show(x, digits=max(3, getOption("digits") - 3), ...)
## S3 method for class 'gpd':
summary(object, nsim=1000, alpha=0.05, ...)
## S3 method for class 'gpd':
coef(object, ...)
## S3 method for class 'gpd':
plot(x, main=rep(NULL, 4), xlab=rep(NULL, 4), nsim=1000, alpha=0.05, ...)
## S3 method for class 'gpd':
AIC(object, ..., k=2)
## S3 method for class 'gpd':
coefficients(object, ...)
## S3 method for class 'bgpd':
print(x, print.seed=FALSE, ...)
## S3 method for class 'bgpd':
summary(object, ...)
## S3 method for class 'bgpd':
plot(x, which.plots=1:3, density.adjust=2, print.seed=FALSE, ...)
## S3 method for class 'bgpd':
coef(object, ...)
## S3 method for class 'bgpd':
coefficients(object, ...)
## S3 method for class 'summary.gpd':
print(x, dig=3, ...)
## S3 method for class 'summary.bgpd':
print(x, ...)
## S3 method for class 'summary.gpd':
show(x, dig=3, ...)

Arguments

Either a numeric vector or the name of a variable in data.

data

A data object containing y and any covariates.

The threshold for y

An alternative to th, specifying the quantile of the y

phi

Formula for the log of the scale parameter. Defaults to phi = ~ 1 - i.e. no covariates.

Formula for the shape parameter. Defaults to phi = ~ 1 - i.e. no covariates.

penalty

How to penalize the likelhood. Currently, either ``none'', ``gaussian'' or ``lasso'' are the only allowed arguments. See the information on the prior argument, below.

prior

If method = "optimize", just an alternative way of specifying the pentalty, and only one or neither of penalty and prior should be given. If method = "simulate", prior must be ``gaussian''

method

Should be either ``optimize'' (the default) or ``simulate''. The first letter or various abbreviations will do. If ``optimize'' is used, the (penalized) likelihood is directly optimized using optim and point estimates (either M

start

Starting values for the parameters, to be passed to optim. If not provided, an exponential distribution is assumed as the starting point.

priorParameters

A list with two components. The first should be a vector of means, the second should be a covariance matrix if the penalty/prior is "gaussian" and a precision matrix is the penalty is "lasso". These repres

maxit

The number of iterations allowed in optim

trace

Whether or not to print progress to screen. If method = "optimize", the argument is passed into optim -- see the help for that function. If method = "simulate", the argument determines at how man

iter

Number of simulations to generate under method = "simulate".

burn

The number of initial steps to be discarded.

thin

The degree of thinning of the resulting Markov chains. Defaults to 1 (no thinning). A value of 0.5 (for example) would result in every other step being discarded.

jump.const

Control parameter for the Metropolis algorithm.

verbose

Whether or not to print progress to screen. Defaults to verbose=TRUE.

x, object

Object of class gpd, bgpd, summary.gpd or summary.bgpd returned by gpd or summary.gpd.

digits

Number of digits for printing.

main

Titles for plots. Should be a vector of length 4.

xlab

In plot.gpd, x-axis labels for plots. Should be a vector of length 4.

nsim

The number of simulations to use to produce the simulation envelope. Defaults to nsim = 1000

alpha

A (1 - alpha)% simulation envelope is produced. Defaults to alpha = 0.05

Constant used in calculation of AIC=-2*loglik + k*p, defaults to k=2.

print.seed

Whether or not to print the seed used in the simulations, or to annotate the plots with it. Defaults to print.seed=FALSE.

which.plots

Which plots to produce. Option 1 gives kernel density estimates, 2 gives traces of the Markov chains with superimposed cumulative means, 3 gives autocorrelation functions. Defaults to wh

density.adjust

Passed into density. Controls the amount of smoothing of the kernel density estimate. Defaults to density.adjust=2.

dig

Number of digits to print.

...

Further arguments to be passed to methods.

Value

If method = "optimize", an object of class gpd.
convergenceOutput from optim relating to whether or not the optimizer converged.
messageA message telling the user whether or not convergence was achieved.
covThe estimated covariance of the parameters in the model.
seThe estimated standard errors of the parameters in the model.
thresholdThe threshold of the data above which the GPD model was fit.
penaltyThe type of penalty function used, if any.
coefficientsThe parameter estimates as computed under maximum likelihood or maximum penalized likelihood.
rateThe proportion of observations above the threshold.
callThe call to gpd that produced the object.
yThe response data.
X.phiThe design matrix for the log of the scale parameter.
X.xiThe design matrix for the shape parameter.
loglikThe value of the optimized log-likelihood.
If method = "simulate", an object of class bgpd.
callThe call to gpd that produced the object.
thresholdThe threshold above which the model was fit.
mapThe point estimates found by maximum penalized likelihood and which were used as the starting point for the Markov chain.
burnThe number of steps of the Markov chain that are to be treated as the burn-in and not used in inferences.
thinThe degree of thinning used.
chainsThe entire Markov chain generated by the Metropolis algorithm.
paramThe remainder of the chain after deleting the burn-in and applying any thinning.
X.phiThe design matrix for the log of the scale parameter.
X.xiThe design matrix for the log of the scale parameter.
acceptanceThe proportion of proposals that were accepted by the Metropolis algorithm.
seedThe seed used by the random number generator.

Details

Working with the log of the scale parameter was found to improve the stability of computations, and it makes a quadratic penalty more appropriate. A quadratic penalty can be thought of as a Gaussian prior distribution, whence the terminology of the function. Some of the internal code is based on the gpd.fit function in the ismev package and is due to Stuart Coles.

When a summary or plot is performed, a simulation envelope is produced the data, based on quantiles of the fitted model. This represents a pointwise (1 - alpha)% simulated confidence interval. Since the ordered observations will be correlated, if any observation is outside the envelope, it is likely that a chain of observations will be outside the envelope. Therefore, if the number outside the envelope is a little more than alpha%, that does not immediately imply a serious shortcoming of the fitted model.

When method = "optimize", the plot function produces diagnostic plots for the fitted generalized Pareto model. A PP-plot, QQ-plot, histogram with superimposed generalized Pareto density estimate, and a return level plot with confidence interval are produced. The PP-plot and QQ-plot contain simulated pointwise confidence regions. The region is a (1 - alpha)% region based on nsim simulated samples.

If start is not provided, the maximum penalized likelhood point estimates are computed and used.

If method = "simulate", the simulation is done by a Metropolis algorithm.

When plotting the object, if the chains have converged on the posterior distributions, the trace plots should look like `fat hairy caterpillars' and their cumulative means should converge rapidly. Moreover, the autocorrelation functions should converge quickly to zero.

When printing or summarizing the object, posterior means and standard deviations are computed. Posterior means are also returned by the coef method. Depending on what you want to do and what the posterior distributions look like (try using plot.bgpd) you might want to work with quantiles of the poseterior distributions instead.

Examples

Run this code

x <- rnorm(1000)
  mod <- gpd(x, qu = 0.7)
  mod
  par(mfrow=c(2, 2))
  plot(mod)
# Following lines commented out to keep CRAN robots happy
#  mod <- gpd(x, qu=.7, method="sim")
#  mod
#  par(mfrow=c(3, 2))
#  plot(mod)

Run the code above in your browser using DataLab