kdengpd: Kernel Density Estimate and GPD Tail Extreme Value Mixture Model

Description

Density, cumulative distribution function, quantile function and random number generation for the extreme value mixture model with kernel density estimate for bulk distribution upto the threshold and conditional GPD above threshold. The parameters are the bandwidth lambda, threshold u GPD scale sigmau and shape xi and tail fraction phiu.

Usage

dkdengpd(x, kerncentres, lambda = NULL,
  u = as.vector(quantile(kerncentres, 0.9)), sigmau = sqrt(6 *
  var(kerncentres))/pi, xi = 0, phiu = TRUE, bw = NULL,
  kernel = "gaussian", log = FALSE)
pkdengpd(q, kerncentres, lambda = NULL,
  u = as.vector(quantile(kerncentres, 0.9)), sigmau = sqrt(6 *
  var(kerncentres))/pi, xi = 0, phiu = TRUE, bw = NULL,
  kernel = "gaussian", lower.tail = TRUE)
qkdengpd(p, kerncentres, lambda = NULL,
  u = as.vector(quantile(kerncentres, 0.9)), sigmau = sqrt(6 *
  var(kerncentres))/pi, xi = 0, phiu = TRUE, bw = NULL,
  kernel = "gaussian", lower.tail = TRUE)
rkdengpd(n = 1, kerncentres, lambda = NULL,
  u = as.vector(quantile(kerncentres, 0.9)), sigmau = sqrt(6 *
  var(kerncentres))/pi, xi = 0, phiu = TRUE, bw = NULL,
  kernel = "gaussian")

Arguments

quantiles

kerncentres

kernel centres (typically sample data vector or scalar)

lambda

bandwidth for kernel (as half-width of kernel) or NULL

threshold

sigmau

scale parameter (positive)

shape parameter

phiu

probability of being above threshold $[0, 1]$ or TRUE

bandwidth for kernel (as standard deviations of kernel) or NULL

kernel

kernel name (default = "gaussian")

log

logical, if TRUE then log density

quantiles

lower.tail

logical, if FALSE then upper tail probabilities

cumulative probabilities

sample size (positive integer)

Value

dkdengpd gives the density, pkdengpd gives the cumulative distribution function, qkdengpd gives the quantile function and rkdengpd gives a random sample.

Acknowledgments

Based on code by Anna MacDonald produced for MATLAB.

Details

Extreme value mixture model combining kernel density estimate (KDE) for the bulk below the threshold and GPD for upper tail.

The user can pre-specify phiu permitting a parameterised value for the tail fraction $\phi_u$. Alternatively, when phiu=TRUE the tail fraction is estimated as the tail fraction from the KDE bulk model.

The alternate bandwidth definitions are discussed in the kernels, with the lambda as the default. The bw specification is the same as used in the density function.

The possible kernels are also defined in kernels with the "gaussian" as the default choice.

The cumulative distribution function with tail fraction $\phi_u$ defined by the upper tail fraction of the kernel density estimate (phiu=TRUE), upto the threshold $x \le u$, given by: $$F(x) = H(x)$$ and above the threshold $x > u$: $$F(x) = H(u) + [1 - H(u)] G(x)$$ where $H(x)$ and $G(X)$ are the KDE and conditional GPD cumulative distribution functions respectively.

The cumulative distribution function for pre-specified $\phi_u$, upto the threshold $x \le u$, is given by: $$F(x) = (1 - \phi_u) H(x)/H(u)$$ and above the threshold $x > u$: $$F(x) = \phi_u + [1 - \phi_u] G(x)$$ Notice that these definitions are equivalent when $\phi_u = 1 - H(u)$.

If no bandwidth is provided lambda=NULL and bw=NULL then the normal reference rule is used, using the bw.nrd0 function, which is consistent with the density function. At least two kernel centres must be provided as the variance needs to be estimated.

See gpd for details of GPD upper tail component and dkden for details of KDE bulk component.

References

http://en.wikipedia.org/wiki/Kernel_density_estimation

http://en.wikipedia.org/wiki/Generalized_Pareto_distribution

Scarrott, C.J. and MacDonald, A. (2012). A review of extreme value threshold estimation and uncertainty quantification. REVSTAT - Statistical Journal 10(1), 33-59. Available from http://www.ine.pt/revstat/pdf/rs120102.pdf

Bowman, A.W. (1984). An alternative method of cross-validation for the smoothing of density estimates. Biometrika 71(2), 353-360.

Duin, R.P.W. (1976). On the choice of smoothing parameters for Parzen estimators of probability density functions. IEEE Transactions on Computers C25(11), 1175-1179.

MacDonald, A., Scarrott, C.J., Lee, D., Darlow, B., Reale, M. and Russell, G. (2011). A flexible extreme value mixture model. Computational Statistics and Data Analysis 55(6), 2137-2157.

Wand, M. and Jones, M.C. (1995). Kernel Smoothing. Chapman && Hall.

Examples

Run this code

# NOT RUN {
set.seed(1)
par(mfrow = c(2, 2))

kerncentres=rnorm(500, 0, 1)
xx = seq(-4, 4, 0.01)
hist(kerncentres, breaks = 100, freq = FALSE)
lines(xx, dkdengpd(xx, kerncentres, u = 1.2, sigmau = 0.56, xi = 0.1))

plot(xx, pkdengpd(xx, kerncentres), type = "l")
lines(xx, pkdengpd(xx, kerncentres, xi = 0.3), col = "red")
lines(xx, pkdengpd(xx, kerncentres, xi = -0.3), col = "blue")
legend("topleft", paste("xi =",c(0, 0.3, -0.3)),
      col=c("black", "red", "blue"), lty = 1, cex = 0.5)

x = rkdengpd(1000, kerncentres, phiu = 0.1, u = 1.2, sigmau = 0.56, xi = 0.1)
xx = seq(-4, 6, 0.01)
hist(x, breaks = 100, freq = FALSE, xlim = c(-4, 6))
lines(xx, dkdengpd(xx, kerncentres, phiu = 0.1, u = 1.2, sigmau = 0.56, xi = 0.1))

plot(xx, dkdengpd(xx, kerncentres, xi=0, phiu = 0.1), type = "l")
lines(xx, dkdengpd(xx, kerncentres, xi=0.2, phiu = 0.1), col = "red")
lines(xx, dkdengpd(xx, kerncentres, xi=-0.2, phiu = 0.1), col = "blue")
legend("topleft", c("xi = 0", "xi = 0.2", "xi = -0.2"),
      col=c("black", "red", "blue"), lty = 1)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab