itmnormgpd: Normal Bulk and GPD Tail Interval Transition Mixture Model

Description

Density, cumulative distribution function, quantile function and random number generation for the normal bulk and GPD tail interval transition mixture model. The parameters are the normal mean nmean and standard deviation nsd, threshold u, interval half-width epsilon, GPD scale sigmau and shape xi.

Usage

ditmnormgpd(x, nmean = 0, nsd = 1, epsilon = nsd, u = qnorm(0.9,
  nmean, nsd), sigmau = nsd, xi = 0, log = FALSE)
pitmnormgpd(q, nmean = 0, nsd = 1, epsilon = nsd, u = qnorm(0.9,
  nmean, nsd), sigmau = nsd, xi = 0, lower.tail = TRUE)
qitmnormgpd(p, nmean = 0, nsd = 1, epsilon = nsd, u = qnorm(0.9,
  nmean, nsd), sigmau = nsd, xi = 0, lower.tail = TRUE)
ritmnormgpd(n = 1, nmean = 0, nsd = 1, epsilon = nsd,
  u = qnorm(0.9, nmean, nsd), sigmau = nsd, xi = 0)

Arguments

quantiles

nmean

normal mean

nsd

normal standard deviation (positive)

epsilon

interval half-width

threshold

sigmau

scale parameter (positive)

shape parameter

log

logical, if TRUE then log density

quantiles

lower.tail

logical, if FALSE then upper tail probabilities

cumulative probabilities

sample size (positive integer)

Value

ditmnormgpd gives the density, pitmnormgpd gives the cumulative distribution function, qitmnormgpd gives the quantile function and ritmnormgpd gives a random sample.

Details

The interval transition mixture model combines a normal for the bulk model with GPD for the tail model, with a smooth transition over the interval $(u-epsilon, u+epsilon)$. The mixing function warps the normal to map from $(u-epsilon, u)$ to $(u-epsilon, u+epsilon)$ and warps the GPD from $(u, u+epsilon)$ to $(u-epsilon, u+epsilon)$.

The cumulative distribution function is defined by $$F(x)=\kappa(H_t(q(x)) + G(p(x)))$$ where $H_t(x)$ and $G(x)$ are the truncated normal and conditional GPD cumulative distribution functions (i.e. pnorm(x, nmean, nsd) and pgpd(x, u, sigmau, xi)) respectively. The truncated normal is not renormalised to be proper, so $H_t(x)$ contrubutes pnorm(u, nmean, nsd) to the cdf for all $x\geq (u + \epsilon)$. The normalisation constant $\kappa$ ensures a proper density, given by 1/(1+pnorm(u, nmean, nsd)) where 1 is from GPD component and latter is contribution from normal component.

The mixing functions $q(x)$ and $p(x)$ suggested by Holden and Haug (2013) have been implemented. These are symmetric about the threshold $u$. So for computational convenience only $q(x;u)$ has been implemented as qmix for a given $u$, with the complementary mixing function is then defined as $p(x;u)=-q(-x;-u)$.

A minor adaptation of the mixing function has been applied. For the mixture model to function correctly $q(x)>=u$ for all $x\ge u+\epsilon$, as then the bulk model will contribute the constant $H_t(u)=H(u)$ for all $x$ above the interval. Holden and Haug (2013) define $q(x)=x-\epsilon$ for all $x\ge u$. For more straightforward and interpretable computational implementation the mixing function has been set to the threshold $q(x)=u$ for all $x\ge u$, so the cdf/pdf of the normal model can be used directly. We do not have to define cdf/pdf for the non-proper truncated normal seperately. As such $q'(x)=0$ for all $x\ge u$ in qmixxprime, which also makes it clearer that normal does not contribute to the tail above the interval and vice-versa.

The quantile function within the transition interval is not available in closed form, so has to be solved numerically. Outside of the interval, the quantile are obtained from the normal and GPD components directly.

References

http://en.wikipedia.org/wiki/Normal_distribution

http://en.wikipedia.org/wiki/Generalized_Pareto_distribution

Scarrott, C.J. and MacDonald, A. (2012). A review of extreme value threshold estimation and uncertainty quantification. REVSTAT - Statistical Journal 10(1), 33-59. Available from http://www.ine.pt/revstat/pdf/rs120102.pdf

Holden, L. and Haug, O. (2013). A mixture model for unsupervised tail estimation. arxiv:0902.4137

Examples

Run this code

# NOT RUN {
set.seed(1)
par(mfrow = c(2, 2))

xx = seq(-4, 5, 0.01)
u = 1.5
epsilon = 0.4
kappa = 1/(1 + pnorm(u, 0, 1))

f = ditmnormgpd(xx, nmean = 0, nsd = 1, epsilon, u, sigmau = 1, xi = 0.5)
plot(xx, f, ylim = c(0, 1), xlim = c(-4, 5), type = 'l', lwd = 2, xlab = "x", ylab = "density")
lines(xx, kappa * dgpd(xx, u, sigmau = 1, xi = 0.5), col = "red", lty = 2, lwd = 2)
lines(xx, kappa * dnorm(xx, 0, 1), col = "blue", lty = 2, lwd = 2)
abline(v = u + epsilon * seq(-1, 1), lty = c(2, 1, 2))
legend('topright', c('Normal-GPD ITM', 'kappa*Normal', 'kappa*GPD'),
      col = c("black", "blue", "red"), lty = c(1, 2, 2), lwd = 2)

# cdf contributions
F = pitmnormgpd(xx, nmean = 0, nsd = 1, epsilon, u, sigmau = 1, xi = 0.5)
plot(xx, F, ylim = c(0, 1), xlim = c(-4, 5), type = 'l', lwd = 2, xlab = "x", ylab = "cdf")
lines(xx[xx > u], kappa * (pnorm(u, 0, 1) + pgpd(xx[xx > u], u, sigmau = 1, xi = 0.5)),
     col = "red", lty = 2, lwd = 2)
lines(xx[xx <= u], kappa * pnorm(xx[xx <= u], 0, 1), col = "blue", lty = 2, lwd = 2)
abline(v = u + epsilon * seq(-1, 1), lty = c(2, 1, 2))
legend('topleft', c('Normal-GPD ITM', 'kappa*Normal', 'kappa*GPD'),
      col = c("black", "blue", "red"), lty = c(1, 2, 2), lwd = 2)

# simulated data density histogram and overlay true density 
x = ritmnormgpd(10000, nmean = 0, nsd = 1, epsilon, u, sigmau = 1, xi = 0.5)
hist(x, freq = FALSE, breaks = seq(-4, 1000, 0.1), xlim = c(-4, 5))
lines(xx, ditmnormgpd(xx, nmean = 0, nsd = 1, epsilon, u, sigmau = 1, xi = 0.5),
  lwd = 2, col = 'black')  
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab