gkg: Kernel Density Estimate and GPD Both Upper and Lower Tails Extreme Value Mixture Model

Description

Density, cumulative distribution function, quantile function and random number generation for the extreme value mixture model with kernel density estimate for bulk distribution between thresholds and conditional GPD beyond thresholds. The parameters are the kernel bandwidth lambda, lower tail (threshold ul, GPD scale sigmaul and shape xil and tail fraction phiul) and upper tail (threshold ur, GPD scale sigmaur and shape xiR and tail fraction phiur).

Usage

dgkg(x, kerncentres, lambda = NULL,
  ul = as.vector(quantile(kerncentres, 0.1)), sigmaul = sqrt(6 *
  var(kerncentres))/pi, xil = 0, phiul = TRUE,
  ur = as.vector(quantile(kerncentres, 0.9)), sigmaur = sqrt(6 *
  var(kerncentres))/pi, xir = 0, phiur = TRUE, bw = NULL,
  kernel = "gaussian", log = FALSE)
pgkg(q, kerncentres, lambda = NULL,
  ul = as.vector(quantile(kerncentres, 0.1)), sigmaul = sqrt(6 *
  var(kerncentres))/pi, xil = 0, phiul = TRUE,
  ur = as.vector(quantile(kerncentres, 0.9)), sigmaur = sqrt(6 *
  var(kerncentres))/pi, xir = 0, phiur = TRUE, bw = NULL,
  kernel = "gaussian", lower.tail = TRUE)
qgkg(p, kerncentres, lambda = NULL,
  ul = as.vector(quantile(kerncentres, 0.1)), sigmaul = sqrt(6 *
  var(kerncentres))/pi, xil = 0, phiul = TRUE,
  ur = as.vector(quantile(kerncentres, 0.9)), sigmaur = sqrt(6 *
  var(kerncentres))/pi, xir = 0, phiur = TRUE, bw = NULL,
  kernel = "gaussian", lower.tail = TRUE)
rgkg(n = 1, kerncentres, lambda = NULL,
  ul = as.vector(quantile(kerncentres, 0.1)), sigmaul = sqrt(6 *
  var(kerncentres))/pi, xil = 0, phiul = TRUE,
  ur = as.vector(quantile(kerncentres, 0.9)), sigmaur = sqrt(6 *
  var(kerncentres))/pi, xir = 0, phiur = TRUE, bw = NULL,
  kernel = "gaussian")

Arguments

quantiles

kerncentres

kernel centres (typically sample data vector or scalar)

lambda

bandwidth for kernel (as half-width of kernel) or NULL

lower tail threshold

sigmaul

lower tail GPD scale parameter (positive)

xil

lower tail GPD shape parameter

phiul

probability of being below lower threshold $[0, 1]$ or TRUE

upper tail threshold

sigmaur

upper tail GPD scale parameter (positive)

xir

upper tail GPD shape parameter

phiur

probability of being above upper threshold $[0, 1]$ or TRUE

bandwidth for kernel (as standard deviations of kernel) or NULL

kernel

kernel name (default = "gaussian")

log

logical, if TRUE then log density

quantiles

lower.tail

logical, if FALSE then upper tail probabilities

cumulative probabilities

sample size (positive integer)

Value

dgkg gives the density, pgkg gives the cumulative distribution function, qgkg gives the quantile function and rgkg gives a random sample.

Acknowledgments

Based on code by Anna MacDonald produced for MATLAB.

Details

Extreme value mixture model combining kernel density estimate (KDE) for the bulk between thresholds and GPD beyond thresholds.

The user can pre-specify phiul and phiur permitting a parameterised value for the tail fractions $\phi_ul$ and $\phi_ur$. Alternatively, when phiul=TRUE and phiur=TRUE the tail fractions are estimated as the tail fractions from the KDE bulk model.

The alternate bandwidth definitions are discussed in the kernels, with the lambda as the default. The bw specification is the same as used in the density function.

The possible kernels are also defined in kernels with the "gaussian" as the default choice.

Notice that the tail fraction cannot be 0 or 1, and the sum of upper and lower tail fractions phiul + phiur < 1, so the lower threshold must be less than the upper, ul < ur.

The cumulative distribution function has three components. The lower tail with tail fraction $\phi_{ul}$ defined by the KDE bulk model (phiul=TRUE) upto the lower threshold $x < u_l$: $$F(x) = H(u_l) [1 - G_l(x)].$$ where $H(x)$ is the kernel density estimator cumulative distribution function (i.e. mean(pnorm(x, kerncentres, bw)) and $G_l(X)$ is the conditional GPD cumulative distribution function with negated $x$ value and threshold, i.e. pgpd(-x, -ul, sigmaul, xil, phiul). The KDE bulk model between the thresholds $u_l \le x \le u_r$ given by: $$F(x) = H(x).$$ Above the threshold $x > u_r$ the usual conditional GPD: $$F(x) = H(u_r) + [1 - H(u_r)] G_r(x)$$ where $G_r(X)$ is the GPD cumulative distribution function, i.e. pgpd(x, ur, sigmaur, xir, phiur).

The cumulative distribution function for the pre-specified tail fractions $\phi_{ul}$ and $\phi_{ur}$ is more complicated. The unconditional GPD is used for the lower tail $x < u_l$: $$F(x) = \phi_{ul} [1 - G_l(x)].$$ The KDE bulk model between the thresholds $u_l \le x \le u_r$ given by: $$F(x) = \phi_{ul}+ (1-\phi_{ul}-\phi_{ur}) (H(x) - H(u_l)) / (H(u_r) - H(u_l)).$$ Above the threshold $x > u_r$ the usual conditional GPD: $$F(x) = (1-\phi_{ur}) + \phi_{ur} G(x)$$ Notice that these definitions are equivalent when $\phi_{ul} = H(u_l)$ and $\phi_{ur} = 1 - H(u_r)$.

If no bandwidth is provided lambda=NULL and bw=NULL then the normal reference rule is used, using the bw.nrd0 function, which is consistent with the density function. At least two kernel centres must be provided as the variance needs to be estimated.

See gpd for details of GPD upper tail component and dkden for details of KDE bulk component.

References

http://en.wikipedia.org/wiki/Kernel_density_estimation

http://en.wikipedia.org/wiki/Generalized_Pareto_distribution

Scarrott, C.J. and MacDonald, A. (2012). A review of extreme value threshold estimation and uncertainty quantification. REVSTAT - Statistical Journal 10(1), 33-59. Available from http://www.ine.pt/revstat/pdf/rs120102.pdf

Bowman, A.W. (1984). An alternative method of cross-validation for the smoothing of density estimates. Biometrika 71(2), 353-360.

Duin, R.P.W. (1976). On the choice of smoothing parameters for Parzen estimators of probability density functions. IEEE Transactions on Computers C25(11), 1175-1179.

MacDonald, A., Scarrott, C.J., Lee, D., Darlow, B., Reale, M. and Russell, G. (2011). A flexible extreme value mixture model. Computational Statistics and Data Analysis 55(6), 2137-2157.

Wand, M. and Jones, M.C. (1995). Kernel Smoothing. Chapman && Hall.

Examples

Run this code

# NOT RUN {
set.seed(1)
par(mfrow = c(2, 2))

kerncentres=rnorm(1000,0,1)
x = rgkg(1000, kerncentres, phiul = 0.15, phiur = 0.15)
xx = seq(-6, 6, 0.01)
hist(x, breaks = 100, freq = FALSE, xlim = c(-6, 6))
lines(xx, dgkg(xx, kerncentres, phiul = 0.15, phiur = 0.15))

# three tail behaviours
plot(xx, pgkg(xx, kerncentres), type = "l")
lines(xx, pgkg(xx, kerncentres,xil = 0.3, xir = 0.3), col = "red")
lines(xx, pgkg(xx, kerncentres,xil = -0.3, xir = -0.3), col = "blue")
legend("topleft", paste("Symmetric xil=xir=",c(0, 0.3, -0.3)),
  col=c("black", "red", "blue"), lty = 1)

# asymmetric tail behaviours
x = rgkg(1000, kerncentres, xil = -0.3, phiul = 0.1, xir = 0.3, phiur = 0.1)
xx = seq(-6, 6, 0.01)
hist(x, breaks = 100, freq = FALSE, xlim = c(-6, 6))
lines(xx, dgkg(xx, kerncentres, xil = -0.3, phiul = 0.1, xir = 0.3, phiur = 0.1))

plot(xx, dgkg(xx, kerncentres, xil = -0.3, phiul = 0.2, xir = 0.3, phiur = 0.2),
  type = "l", ylim = c(0, 0.4))
lines(xx, dgkg(xx, kerncentres, xil = -0.3, phiul = 0.3, xir = 0.3, phiur = 0.3),
  col = "red")
lines(xx, dgkg(xx, kerncentres, xil = -0.3, phiul = TRUE, xir = 0.3, phiur = TRUE),
  col = "blue")
legend("topleft", c("phiul = phiur = 0.2", "phiul = phiur = 0.3", "Bulk Tail Fraction"),
  col=c("black", "red", "blue"), lty = 1)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab