itmgng: Normal Bulk with GPD Upper and Lower Tails Interval Transition Mixture Model

Description

Density, cumulative distribution function, quantile function and random number generation for the extreme value mixture model with normal for bulk distribution between the upper and lower thresholds with conditional GPD's for the two tails and interval transition. The parameters are the normal mean nmean and standard deviation nsd, interval half-width espilon, lower tail (threshold ul, GPD scale sigmaul and shape xil and tail fraction phiul) and upper tail (threshold ur, GPD scale sigmaur and shape xiR and tail fraction phiuR).

Usage

ditmgng(x, nmean = 0, nsd = 1, epsilon = nsd, ul = qnorm(0.1,
  nmean, nsd), sigmaul = nsd, xil = 0, ur = qnorm(0.9, nmean, nsd),
  sigmaur = nsd, xir = 0, log = FALSE)
pitmgng(q, nmean = 0, nsd = 1, epsilon = nsd, ul = qnorm(0.1,
  nmean, nsd), sigmaul = nsd, xil = 0, ur = qnorm(0.9, nmean, nsd),
  sigmaur = nsd, xir = 0, lower.tail = TRUE)
qitmgng(p, nmean = 0, nsd = 1, epsilon, ul = qnorm(0.1, nmean, nsd),
  sigmaul = nsd, xil = 0, ur = qnorm(0.9, nmean, nsd),
  sigmaur = nsd, xir = 0, lower.tail = TRUE)
ritmgng(n = 1, nmean = 0, nsd = 1, epsilon = sd, ul = qnorm(0.1,
  nmean, nsd), sigmaul = nsd, xil = 0, ur = qnorm(0.9, nmean, nsd),
  sigmaur = nsd, xir = 0)

Arguments

quantiles

nmean

normal mean

nsd

normal standard deviation (positive)

epsilon

interval half-width

lower tail threshold

sigmaul

lower tail GPD scale parameter (positive)

xil

lower tail GPD shape parameter

upper tail threshold

sigmaur

upper tail GPD scale parameter (positive)

xir

upper tail GPD shape parameter

log

logical, if TRUE then log density

quantiles

lower.tail

logical, if FALSE then upper tail probabilities

cumulative probabilities

sample size (positive integer)

Value

ditmgng gives the density, pitmgng gives the cumulative distribution function, qitmgng gives the quantile function and ritmgng gives a random sample.

Details

The interval transition extreme value mixture model combines a normal distribution for the bulk between the lower and upper thresholds and GPD for upper and lower tails, with a smooth transition over the interval $(u-epsilon, u+epsilon)$ (where $u$ can be exchanged for the lower and upper thresholds). The mixing function warps the normal to map from $(u-epsilon, u)$ to $(u-epsilon, u+epsilon)$ and warps the GPD from $(u, u+epsilon)$ to $(u-epsilon, u+epsilon)$.

The cumulative distribution function is defined by $$F(x)=\kappa(G_l(q(x)) + H_t(r(x)) + G_u(p(x)))$$ where $H_t(x)$ is the truncated normal cdf, i.e. pnorm(x, nmean, nsd). The conditional GPD for the upper tail has cdf $G_u(x)$, i.e. pgpd(x, ur, sigmaur, xir) and lower tail cdf $G_l(x)$ is for the negated support, i.e. 1 - pgpd(-x, -ul, sigmaul, xil). The truncated normal is not renormalised to be proper, so $H_t(x)$ contributes pnorm(ur, nmean, nsd) - pnorm(ul, nmean, nsd) to the cdf for all $x\geq (u_r + \epsilon)$ and zero below $x\leq (u_l - \epsilon)$. The normalisation constant $\kappa$ ensures a proper density, given by 1/(2 + pnorm(ur, nmean, nsd) - pnorm(ul, nmean, nsd) where the 2 is from two GPD components and latter is contribution from normal component.

The mixing functions $q(x)$, $r(x)$ and $p(x)$ are reformulated from the $q_i(x)$ suggested by Holden and Haug (2013). These are symmetric about each threshold, which for convenience will be referred to a simply $u$. So for computational convenience only a single $q(x;u)$ has been implemented for the lower and upper GPD components called qmix for a given $u$, with the complementary mixing function then defined as $p(x;u)=-q(-x;-u)$. The bulk model mixing function $r(x)$ utilises the equivalent of the $q(x)$ for the lower threshold and $p(x)$ for the upper threshold, so these are reused in the bulk mixing function qgbgmix.

A minor adaptation of the mixing function has been applied following a similar approach to that explained in ditmnormgpd. For the bulk model mixing function $r(x)$, we need $r(x)<=ul$ for all $x\le ul - epsilon$ and $r(x)>=ur$ for all $x\ge ur+epsilon$, as then the bulk model will contribute zero below the lower interval and the constant $H_t(ur)=H(ur)-H(ul)$ for all $x$ above the upper interval. Holden and Haug (2013) define $r(x)=x-\epsilon$ for all $x\ge ur$ and $r(x)=x+\epsilon$ for all $x\le ul$. For more straightforward and interpretable computational implementation the mixing function has been set to the lower threshold $r(x)=u_l$ for all $x\le u_l-\epsilon$ and to the upper threshold $r(x)=u_r$ for all $x\le u_r+\epsilon$, so the cdf/pdf of the normal model can be used directly. We do not have to define cdf/pdf for the non-proper truncated normal seperately. As such $r'(x)=0$ for all $x\le u_l-\epsilon$ and $x\ge u_r+\epsilon$ in qmixxprime, which also makes it clearer that normal does not contribute to either tails beyond the intervals and vice-versa.

The quantile function within the transition interval is not available in closed form, so has to be solved numerically. Outside of the interval, the quantile are obtained from the normal and GPD components directly.

References

http://en.wikipedia.org/wiki/Normal_distribution

http://en.wikipedia.org/wiki/Generalized_Pareto_distribution

Scarrott, C.J. and MacDonald, A. (2012). A review of extreme value threshold estimation and uncertainty quantification. REVSTAT - Statistical Journal 10(1), 33-59. Available from http://www.ine.pt/revstat/pdf/rs120102.pdf

Holden, L. and Haug, O. (2013). A mixture model for unsupervised tail estimation. arxiv:0902.4137

Examples

Run this code

# NOT RUN {
set.seed(1)
par(mfrow = c(2, 2))

xx = seq(-5, 5, 0.01)
ul = -1.5;ur = 2
epsilon = 0.8
kappa = 1/(2 + pnorm(ur, 0, 1) - pnorm(ul, 0, 1))

f = ditmgng(xx, nmean = 0, nsd = 1, epsilon, ul, sigmaul = 1, xil = 0.5, ur, sigmaur = 1, xir = 0.5)
plot(xx, f, ylim = c(0, 0.5), xlim = c(-5, 5), type = 'l', lwd = 2, xlab = "x", ylab = "density")
lines(xx, kappa * dgpd(-xx, -ul, sigmau = 1, xi = 0.5), col = "blue", lty = 2, lwd = 2)
lines(xx, kappa * dnorm(xx, 0, 1), col = "red", lty = 2, lwd = 2)
lines(xx, kappa * dgpd(xx, ur, sigmau = 1, xi = 0.5), col = "green", lty = 2, lwd = 2)
abline(v = ul + epsilon * seq(-1, 1), lty = c(2, 1, 2), col = "blue")
abline(v = ur + epsilon * seq(-1, 1), lty = c(2, 1, 2), col = "green")
legend('topright', c('Normal-GPD ITM', 'kappa*GPD Lower', 'kappa*Normal', 'kappa*GPD Upper'),
      col = c("black", "blue", "red", "green"), lty = c(1, 2, 2, 2), lwd = 2)

# cdf contributions
F = pitmgng(xx, nmean = 0, nsd = 1, epsilon, ul, sigmaul = 1, xil = 0.5, ur, sigmaur = 1, xir = 0.5)
plot(xx, F, ylim = c(0, 1), xlim = c(-5, 5), type = 'l', lwd = 2, xlab = "x", ylab = "cdf")
lines(xx[xx < ul], kappa * (1 - pgpd(-xx[xx < ul], -ul, 1, 0.5)), col = "blue", lty = 2, lwd = 2)
lines(xx[(xx >= ul) & (xx <= ur)], kappa * (1 + pnorm(xx[(xx >= ul) & (xx <= ur)], 0, 1) -
      pnorm(ul, 0, 1)), col = "red", lty = 2, lwd = 2)
lines(xx[xx > ur], kappa * (1 + (pnorm(ur, 0, 1) - pnorm(ul, 0, 1)) +
      pgpd(xx[xx > ur], ur, sigmau = 1, xi = 0.5)), col = "green", lty = 2, lwd = 2)
abline(v = ul + epsilon * seq(-1, 1), lty = c(2, 1, 2), col = "blue")
abline(v = ur + epsilon * seq(-1, 1), lty = c(2, 1, 2), col = "green")
legend('topleft', c('Normal-GPD ITM', 'kappa*GPD Lower', 'kappa*Normal', 'kappa*GPD Upper'),
      col = c("black", "blue", "red", "green"), lty = c(1, 2, 2, 2), lwd = 2)

# simulated data density histogram and overlay true density 
x = ritmgng(10000, nmean = 0, nsd = 1, epsilon, ul, sigmaul = 1, xil = 0.5,
                                                ur, sigmaur = 1, xir = 0.5)
hist(x, freq = FALSE, breaks = seq(-1000, 1000, 0.1), xlim = c(-5, 5))
lines(xx, ditmgng(xx, nmean = 0, nsd = 1, epsilon, ul, sigmaul = 1, xil = 0.5,
  ur, sigmaur = 1, xir = 0.5), lwd = 2, col = 'black')
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab