predIntNormK
is called by predIntNorm
.predIntNormK(n, df = n - 1, n.mean = 1, k = 1,
method = "Bonferroni", pi.type = "two-sided",
conf.level = 0.95)
df=n-1
.n.mean=1
(i.e., individual observations). Note that all
future averages must be based on the same sample size.conf.level
.
The default value is k=1
.k
) is greater than 1. The possible values are method="Bonferroni"
(approximate method based on Bonferonni inequality; the default), and pi.type="two-sided"
(the default), pi.type="lower"
,
and pi.type="upper"
.conf.level=0.95
.mean=
$\mu$ and sd=
$\sigma$. Also, let $m$ denote the
sample size associated with the $k$ future averages (i.e., n.mean=
$m$).
When $m=1$, each average is really just a single observation, so in the rest of
this help file the term tolIntNorm
).
Similarly, the form of a one-sided lower prediction interval is:
$$[\bar{x} - Ks, \infty] \;\;\;\;\;\; (4)$$
and the form of a one-sided upper prediction interval is:
$$[-\infty, \bar{x} + Ks] \;\;\;\;\;\; (5)$$
but $K$ differs for one-sided versus two-sided prediction intervals.
The derivation of the constant $K$ is explained below. The function
predIntNormK
computes the value of $K$ and is called by
predIntNorm
.
The Derivation of K for One Future Observation or Average (k = 1)
Let $X$ denote a random variable from a normal distribution
with parameters mean=
$\mu$ and sd=
$\sigma$, and let
$x_p$ denote the $p$'th quantile of $X$.
A true two-sided $(1-\alpha)100%$ prediction interval for the next
$k=1$ observation of $X$ is given by:
$$[x_{\alpha/2}, x_{1-\alpha/2}] = [\mu - z_{1-\alpha/2}\sigma, \mu + z_{1-\alpha/2}\sigma] \;\;\;\;\;\; (6)$$
where $z_p$ denotes the $p$'th quantile of a standard normal distribution.
More generally, a true two-sided $(1-\alpha)100%$ prediction interval for the
next $k=1$ average based on a sample of size $m$ is given by:
$$[\mu - z_{1-\alpha/2}\frac{\sigma}{\sqrt{m}}, \mu + z_{1-\alpha/2}\frac{\sigma}{\sqrt{m}}] \;\;\;\;\;\; (7)$$
Because the values of $\mu$ and $\sigma$ are unknown, they must be
estimated, and a prediction interval then constructed based on the estimated
values of $\mu$ and $\sigma$.
For a two-sided prediction interval (pi.type="two-sided"
),
the constant $K$ for a $(1-\alpha)100%$ prediction interval for the next
$k=1$ average based on a sample size of $m$ is computed as:
$$K = t_{n-1, 1-\alpha/2} \sqrt{\frac{1}{m} + \frac{1}{n}} \;\;\;\;\;\; (8)$$
where $t_{\nu, p}$ denotes the $p$'th quantile of the
Student's t-distribution with $\nu$
degrees of freedom. For a one-sided prediction interval
(pi.type="lower"
or pi.type="lower"
), the prediction interval
is given by:
$$K = t_{n-1, 1-\alpha} \sqrt{\frac{1}{m} + \frac{1}{n}} \;\;\;\;\;\; (9)$$.
The formulas for these prediction intervals are derived as follows. Let
$\bar{y}$ denote the future average based on $m$ observations. Then
the quantity $\bar{y} - \bar{x}$ has a normal distribution with expectation
and variance given by:
$$E(\bar{y} - \bar{x}) = 0 \;\;\;\;\;\; (10)$$
$$Var(\bar{y} - \bar{x}) = Var(\bar{y}) + Var(\bar{x}) = \frac{\sigma^2}{m} + \frac{\sigma^2}{n} = \sigma^2(\frac{1}{m} + \frac{1}{n}) \;\;\;\;\;\; (11)$$
so the quantity
$$t = \frac{\bar{y} - \bar{x}}{s\sqrt{\frac{1}{m} + \frac{1}{n}}} \;\;\;\;\;\; (12)$$
has a Student's t-distribution with $n-1$ degrees of freedom.
The Derivation of K for More than One Future Observation or Average (k >1)
When $k > 1$, the function predIntNormK
allows for two ways to compute
$K$: an exact method due to Dunnett (1955) (method="exact"
), and
an approximate (conservative) method based on the Bonferroni inequality
(method="Bonferroni"
; see Miller, 1981a, pp.8, 67-70;
Gibbons et al., 2009, p.4). Each of these methods is explained below.
Exact Method Due to Dunnett (1955) (method="exact"
)
Dunnett (1955) derived the value of $K$ in the context of the multiple
comparisons problem of comparing several treatment means to one control mean.
The value of $K$ is computed as:
$$K = c \sqrt{\frac{1}{m} + \frac{1}{n}} \;\;\;\;\;\; (13)$$
where $c$ is a constant that depends on the sample size $n$, the number of
future observations (averages) $k$, the sample size associated with the
$k$ future averages $m$, and the confidence level $(1-\alpha)100%$.
When pi.type="lower"
or pi.type="upper"
, the value of $c$ is the
number that satisfies the following equation (Gupta and Sobel, 1957; Hahn, 1970a):
$$1 - \alpha = \int_{0}^{\infty} F_1(cs, k, \rho) h(s\sqrt{n-1}, n-1) \sqrt{n-1} ds \;\;\;\;\;\; (14)$$
where
$$F_1(x, k, \rho) = \int_{\infty}^{\infty} [\Phi(\frac{x + \rho^{1/2}y}{\sqrt{1 - \rho}})]^k \phi(y) dy \;\;\;\;\;\; (15)$$
$$\rho = 1 / (\frac{n}{m} + 1) \;\;\;\;\;\; (16)$$
$$h(x, \nu) = \frac{x^{\nu-1}e^{-x^2/2}}{2^{(\nu/2) - 1} \Gamma(\frac{\nu}{2})} \;\;\;\;\;\; (17)$$
and $\Phi()$ and $\phi()$ denote the cumulative distribution function and
probability density function, respectively, of the standard normal distribution.
Note that the function $h(x, \nu)$ is the probability density function of a
chi random variable with $\nu$ degrees of freedom.
When pi.type="two-sided"
, the value of $c$ is the number that satisfies
the following equation:
$$1 - \alpha = \int_{0}^{\infty} F_2(cs, k, \rho) h(s\sqrt{n-1}, n-1) \sqrt{n-1} ds \;\;\;\;\;\; (18)$$
where
$$F_2(x, k, \rho) = \int_{\infty}^{\infty} [\Phi(\frac{x + \rho^{1/2}y}{\sqrt{1 - \rho}}) - \Phi(\frac{-x + \rho^{1/2}y}{\sqrt{1 - \rho}})]^k \phi(y) dy \;\;\;\;\;\; (19)$$
Approximate Method Based on the Bonferroni Inequality (method="Bonferroni"
)
As shown above, when $k=1$, the value of $K$ is given by Equation (8) or
Equation (9) for two-sided or one-sided prediction intervals, respectively. When
$k > 1$, a conservative way to construct a $(1-\alpha^*)100%$ prediction
interval for the next $k$ observations or averages is to use a Bonferroni
correction (Miller, 1981a, p.8) and set $\alpha = \alpha^*/k$ in Equation (8)
or (9) (Chew, 1968). This value of $K$ will be conservative in that the computed
prediction intervals will be wider than the exact predictions intervals.
Hahn (1969, 1970a) compared the exact values of $K$ with those based on the
Bonferroni inequality for the case of $m=1$ and found the approximation to be
quite satisfactory except when $n$ is small, $k$ is large, and $\alpha$
is large. For example, Gibbons (1987a) notes that for a 99% prediction interval
(i.e., $\alpha = 0.01$) for the next $k$ observations, if $n > 4$,
the bias of $K$ is never greater than 1% no matter what the value of $k$.predIntNorm
, predIntNormSimultaneous
,
predIntLnorm
, tolIntNorm
,
Normal, estimate.object
, enorm
, eqnorm
.# Compute the value of K for a two-sided 95% prediction interval
# for the next observation given a sample size of n=20.
predIntNormK(n = 20)
#[1] 2.144711
#--------------------------------------------------------------------
# Compute the value of K for a one-sided upper 99% prediction limit
# for the next 3 averages of order 2 (i.e., each of the 3 future
# averages is based on a sample size of 2 future observations) given a
# samle size of n=20.
predIntNormK(n = 20, n.mean = 2, k = 3, pi.type = "upper",
conf.level = 0.99)
#[1] 2.258026
#----------
# Compare the result above that is based on the Bonferroni method
# with the exact method.
predIntNormK(n = 20, n.mean = 2, k = 3, method = "exact",
pi.type = "upper", conf.level = 0.99)
#[1] 2.251084
#--------------------------------------------------------------------
# Example 18-1 of USEPA (2009, p.18-9) shows how to construct a 95%
# prediction interval for 4 future observations assuming a
# normal distribution based on arsenic concentrations (ppb) in
# groundwater at a solid waste landfill. There were 4 years of
# quarterly monitoring, and years 1-3 are considered background,
# So the sample size for the prediciton limit is n = 12,
# and the number of future samples is k = 4.
predIntNormK(n = 12, k = 4, pi.type = "upper")
#[1] 2.698976
Run the code above in your browser using DataLab