A prediction interval for some population is an interval on the real line constructed so
that it will contain \(k\) future observations or averages from that population with
some specified probability \((1-\alpha)100\%\), where 0 < \(\alpha\) < 1 and \(k\)
is some pre-specified positive integer. The quantity \((1-\alpha)100\%\) is call
the confidence coefficient or confidence level associated with the prediction interval.
In the case of a Poisson distribution, we have modified the
usual meaning of a prediction interval and instead construct an interval that will
contain \(k\) future observations or \(k\) future sums with a certain
confidence level.
A prediction interval is a random interval; that is, the lower and/or
upper bounds are random variables computed based on sample statistics in the
baseline sample. Prior to taking one specific baseline sample, the probability
that the prediction interval will contain the next \(k\) averages is
\((1-\alpha)100\%\). Once a specific baseline sample is taken and the
prediction interval based on that sample is computed, the probability that that
prediction interval will contain the next \(k\) averages is not necessarily
\((1-a)100\%\), but it should be close.
If an experiment is repeated \(N\) times, and for each experiment:
A sample is taken and a \((1-a)100\%\) prediction interval for \(k=1\)
future observation is computed, and
One future observation is generated and compared to the prediction interval,
then the number of prediction intervals that actually contain the future observation
generated in step 2 above is a binomial random variable with parameters
size=
\(N\) and prob=
\((1-\alpha)100\%\) (see Binomial).
If, on the other hand, only one baseline sample is taken and only one prediction
interval for \(k=1\) future observation is computed, then the number of
future observations out of a total of \(N\) future observations that will be
contained in that one prediction interval is a binomial random variable with
parameters size=
\(N\) and prob=
\((1-\alpha^*)100\%\), where
\(\alpha^*\) depends on the true population parameters and the computed
bounds of the prediction interval.
Because of the discrete nature of the Poisson distribution,
even if the true mean of the distribution \(\lambda\) were known exactly, the
actual confidence level associated with a prediction limit will usually not be exactly equal to
\((1-\alpha)100\%\). For example, for the Poisson distribution with parameter
lambda=2
, the interval [0, 4] contains 94.7% of this distribution and
the interval [0,5] contains 98.3% of this distribution. Thus, no interval can
contain exactly 95% of this distribution, so it is impossible to construct an
exact 95% prediction interval for the next \(k=1\) observation for a
Poisson distribution with parameter lambda=2
.
The Form of a Poisson Prediction Interval
Let \(\underline{x} = x_1, x_2, \ldots, x_n\) denote a vector of \(n\)
observations from a Poisson distribution with parameter
lambda=
\(\lambda\). Also, let \(X\) denote the sum of these
\(n\) random variables, i.e.,
$$X = \sum_{i=1}^n x_i \;\;\;\;\;\; (1)$$
Finally, let \(m\) denote the sample size associated with the \(k\) future
sums (i.e., n.sum=
\(m\)). When \(m=1\), each sum is really just a
single observation, so in the rest of this help file the term “sums”
replaces the phrase “observations or sums”.
Let \(\underline{y} = y_1, y_2, \ldots, y_m\) denote a vector of \(m\)
future observations from a Poisson distribution with parameter
lambda=
\(\lambda^{*}\), and set \(Y\) equal to the sum of these
\(m\) random variables, i.e.,
$$Y = \sum_{i=1}^m y_i \;\;\;\;\;\; (2)$$
Then \(Y\) has a Poisson distribution with parameter
lambda=
\(m\lambda^{*}\) (Johnson et al., 1992, p.160). We are interested
in constructing a prediction limit for the next value of \(Y\), or else the next
\(k\) sums of \(m\) Poisson random variables, based on the observed value of
\(X\) and assuming \(\lambda^{*} = \lambda\).
For a Poisson distribution, the form of a two-sided prediction interval is:
$$[m\bar{x} - K, m\bar{x} + K] = [cX - K, cX + K] \;\;\;\;\;\; (3)$$
where
$$\bar{x} = \frac{X}{n} = \sum_{i=1}^n x_i \;\;\;\;\;\; (4)$$
$$c = \frac{m}{n} \;\;\;\;\;\; (5)$$
and \(K\) is a constant that depends on the sample size \(n\), the
confidence level \((1-\alpha)100\%\), the number of future sums \(k\),
and the sample size associated with the future sums \(m\). Do not confuse
the constant \(K\) (uppercase \(K\)) with the number of future sums
\(k\) (lowercase \(k\)). The symbol \(K\) is used here to be consistent
with the notation used for prediction intervals for the normal distribution
(see predIntNorm
).
Similarly, the form of a one-sided lower prediction interval is:
$$[m\bar{x} - K, \infty] = [cX - K, \infty] \;\;\;\;\;\; (6)$$
and the form of a one-sided upper prediction interval is:
$$[0, m\bar{x} + K] = [0, cX + K] \;\;\;\;\;\; (7)$$
The derivation of the constant \(K\) is explained below.
Conditional Distribution (method="conditional"
)
Nelson (1970) derives a prediction interval for the case \(k=1\) based on the
conditional distribution of \(Y\) given \(X+Y\). He notes that the conditional
distribution of \(Y\) given the quantity \(X+Y=w\) is
binomial with parameters
size=
\(w\) and prob=
\([m\lambda^{*} / (m\lambda^{*} + n\lambda)]\)
(Johnson et al., 1992, p.161). When \(k=1\), the prediction limits are computed
as those most extreme values of \(Y\) that still yield a non-significant test of
the hypothesis \(H_0: \lambda^{*} = \lambda\), which for the conditional
distribution of \(Y\) is equivalent to the hypothesis
\(H_0\): prob=[m /(m + n)]
.
Using the relationship between the binomial and
F-distribution (see the explanation of exact confidence
intervals in the help file for ebinom
), Nelson (1982, p. 203) states
that exact two-sided \((1-\alpha)100\%\) prediction limits [LPL, UPL] are the
closest integer solutions to the following equations:
$$\frac{m}{LPL + 1} = \frac{n}{X} F(2 LPL + 2, 2X, 1 - \alpha/2) \;\;\;\;\;\; (8)$$
$$\frac{UPL}{n} = \frac{X+1}{n} F(2X + 2, 2 UPL, 1 - \alpha/2) \;\;\;\;\;\; (9)$$
where \(F(\nu_1, \nu_2, p)\) denotes the \(p\)'th quantile of the
F-distribution with \(\nu_1\) and \(\nu_2\) degrees of
freedom.
If ci.type="lower"
, \(\alpha/2\) is replaced with \(\alpha\) in
Equation (8) above for \(LPL\), and \(UPL\) is set to \(\infty\).
If ci.type="upper"
, \(\alpha/2\) is replaced with \(\alpha\) in
Equation (9) above for \(UPL\), and \(LPL\) is set to 0.
NOTE: This method is not extended to the case \(k > 1\).
Conditional Distribution Approximation Based on Normal Distribution
(method="conditional.approx.normal"
)
Cox and Hinkley (1974, p.245) derive an approximate prediction interval for the case
\(k=1\). Like Nelson (1970), they note that the conditional distribution of
\(Y\) given the quantity \(X+Y=w\) is binomial with
parameters size=
\(w\) and
prob=
\([m\lambda^{*} / (m\lambda^{*} + n\lambda)]\), and that the
hypothesis \(H_0: \lambda^{*} = \lambda\) is equivalent to the hypothesis
\(H_0\): prob=[m /(m + n)]
.
Cox and Hinkley (1974, p.245) suggest using the normal approximation to the
binomial distribution (in this case, without the continuity correction;
see Zar, 2010, pp.534-536 for information on the continuity correction associated
with the normal approximation to the binomial distribution). Under the null
hypothesis \(H_0: \lambda^{*} = \lambda\), the quantity
$$z = [Y - \frac{c(X+Y)}{1+c}] / \{ [\frac{c(X+Y)}{(1+c)^2}]^{1/2} \} \;\;\;\;\;\; (10)$$
is approximately distributed as a standard normal random variable.
The Case When k = 1
When \(k = 1\) and pi.type="two-sided"
, the prediction limits are computed
by solving the equation
$$z^2 \le z_{1 - \alpha/2}^2 \;\;\;\;\;\; (11)$$
where \(z_p\) denotes the \(p\)'th quantile of the standard normal distribution.
In this case, Gibbons (1987b) notes that the quantity \(K\) in Equation (3) above
is given by:
$$K = \frac{t^2c}{2} tc[X (1 + \frac{1}{c}) + \frac{t^2}{4}]^{1/2} \;\;\;\;\;\; (12)$$
where \(t = z_{1-\alpha/2}\).
When pi.type="lower"
or pi.type="upper"
, \(K\) is computed exactly
as above, except \(t\) is set to \(t = z_{1-\alpha}\).
The Case When k > 1
When \(k > 1\), Gibbons (1987b) suggests using the Bonferroni inequality.
That is, the value of \(K\) is computed exactly as for the case \(k=1\)
described above, except that the Bonferroni value of \(t\) is used in place of the
usual value of \(t\):
When pi.type="two-side"
, \(t = z_{1 - (\alpha/k)/2}\).
When pi.type="lower"
or pi.type="upper"
, \(t = z_{1 - \alpha/k}\).
Conditional Distribution Approximation Based on Student's t-Distribution
(method="conditional.approx.t"
)
When method="conditional.approx.t"
, the exact same procedure is used as when
method="conditional.approx.normal"
, except that the quantity in
Equation (10) is assumed to follow a Student's t-distribution with \(n-1\)
degrees of freedom. Thus, all occurrences of \(z_p\) are replaced with
\(t_{n-1, p}\), where \(t_{\nu, p}\) denotes the \(p\)'th quantile of
Student's t-distribution with \(\nu\) degrees of freedom.
Normal Approximation (method="normal.approx"
)
The normal approximation for Poisson prediction limits was given by
Nelson (1970; 1982, p.203) and is based on the fact that the mean and variance of a
Poisson distribution are the same (Johnson et al, 1992, p.157), and for
“large” values of \(n\) and \(m\), both \(X\) and \(Y\) are
approximately normally distributed.
The Case When k = 1
The quantity \(Y - cX\) is approximately normally distributed with expectation and
variance given by:
$$E(Y - cX) = E(Y) - cE(X) = m\lambda - cn\lambda = 0 \;\;\;\;\;\; (13)$$
$$Var(Y - cX) = Var(Y) + c^2 Var(X) = m\lambda + c^2 n\lambda = m\lambda (1 + \frac{m}{n}) \;\;\;\;\;\; (14)$$
so the quantity
$$z = \frac{Y-cX}{\sqrt{m\hat{\lambda}(1 + \frac{m}{n})}} = \frac{Y-cX}{\sqrt{m\bar{x}(1 + \frac{m}{n})}} \;\;\;\;\;\; (15)$$
is approximately distributed as a standard normal random variable. The function
predIntPois
, however, assumes this quantity is distributed as approximately
a Student's t-distribution with \(n-1\) degrees of freedom.
Thus, following the idea of prediction intervals for a normal distribution
(see predIntNorm
), when pi.type="two-sided"
, the constant
\(K\) for a \((1-\alpha)100\%\) prediction interval for the next \(k=1\) sum
of \(m\) observations is computed as:
$$K = t_{n-1, 1-\alpha/2} \sqrt{m\bar{x} (1 + \frac{m}{n})} \;\;\;\;\;\; (16)$$
where \(t_{\nu, p}\) denotes the \(p\)'th quantile of a
Student's t-distribution with \(\nu\) degrees of freedom.
Similarly, when pi.type="lower"
or pi.type="upper"
, the constant
\(K\) is computed as:
$$K = t_{n-1, 1-\alpha} \sqrt{m\bar{x} (1 + \frac{m}{n})} \;\;\;\;\;\; (17)$$
The Case When k > 1
When \(k > 1\), the value of \(K\) is computed exactly as for the case
\(k = 1\) described above, except that the Bonferroni value of \(t\) is used
in place of the usual value of \(t\):
When pi.type="two-sided"
,
$$K = t_{n-1, 1-(\alpha/k)/2} \sqrt{m\bar{x} (1 + \frac{m}{n})} \;\;\;\;\;\; (18)$$
When pi.type="lower"
or pi.type="upper"
,
$$K = t_{n-1, 1-(\alpha/k)} \sqrt{m\bar{x} (1 + \frac{m}{n})} \;\;\;\;\;\; (19)$$
Hahn and Nelson (1973, p.182) discuss another method of computing \(K\) when
\(k > 1\), but this method is not implemented here.