Last chance! 50% off unlimited learning
Sale ends in
Estimate the mean and standard deviation of a
normal distribution, and construct a simultaneous prediction
interval for the next
predIntNormSimultaneous(x, n.mean = 1, k = 1, m = 2, r = 1, rule = "k.of.m",
delta.over.sigma = 0, pi.type = "upper", conf.level = 0.95,
K.tol = .Machine$double.eps^0.5)
a numeric vector of observations, or an object resulting from a call to an estimating
function that assumes a normal (Gaussian) distribution (e.g., enorm
,
eqnorm
, enormCensored
, etc.). If x
is a
numeric vector, missing (NA
), undefined (NaN
), and
infinite (Inf
, -Inf
) values are allowed but will be removed.
positive integer specifying the sample size associated with the future averages.
The default value is n.mean=1
(i.e., individual observations). Note that all
future averages must be based on the same sample size.
for the rule="k.of.m"
), a positive integer
specifying the minimum number of observations (or averages) out of conf.level
.
The default value is k=1
. This argument is ignored when the argument
rule
is not equal to "k.of.m"
.
positive integer specifying the maximum number of future observations (or
averages) on one future sampling “occasion”.
The default value is m=2
, except when rule="Modified.CA"
, in which
case this argument is ignored and m
is automatically set equal to 4
.
positive integer specifying the number of future sampling “occasions”.
The default value is r=1
.
character string specifying which rule to use. The possible values are
"k.of.m"
("CA"
(California rule),
and "Modified.CA"
(modified California rule).
See the DETAILS section below for more information.
numeric scalar indicating the ratio delta.over.sigma=0
.
character string indicating what kind of prediction interval to compute.
The possible values are pi.type="upper"
(the default), and
pi.type="lower"
.
a scalar between 0 and 1 indicating the confidence level of the prediction interval.
The default value is conf.level=0.95
.
numeric scalar indicating the tolerance to use in the nonlinear search algorithm to
compute K.tol=.Machine$double.eps^(1/2)
.
For many applications, the value of K.tol=1e-4
will speed up computation a
bit.
If x
is a numeric vector, predIntNormSimultaneous
returns a list of
class "estimate"
containing the estimated parameters, the prediction interval,
and other information. See the help file for
estimate.object
for details.
If x
is the result of calling an estimation function,
predIntNormSimultaneous
returns a list whose class is the same as x
.
The list contains the same components as x
, as well as a component called
interval
containing the prediction interval information.
If x
already has a component called interval
, this component is
replaced with the prediction interval information.
What is a Simultaneous Prediction Interval?
A prediction interval for some population is an interval on the real line constructed
so that it will contain predIntNorm
computes a standard prediction
interval based on a sample from a normal distribution.
The function predIntNormSimultaneous
computes a simultaneous prediction
interval that will contain a certain number of future observations with probability
The function predIntNormSimultaneous
computes a simultaneous prediction
interval based on one of three possible rules:
For the rule="k.of.m"
), at least predIntNormSimultaneous
are equivalent to the results of predIntNorm
.
For the California rule (rule="CA"
), with probability
For the Modified California rule (rule="Modified.CA"
), with probability
Simultaneous prediction intervals can be extended to using averages (means) in place
of single observations (USEPA, 2009, Chapter 19). That is, you can create a
simultaneous prediction interval
that will contain a specified number of averages (based on which rule you choose) on
each of predIntNormSimultaneous
,
the argument n.mean
corresponds to
The Form of a Prediction Interval
Let mean=
sd=
n.mean=
For a normal distribution, the form of a two-sided tolIntNorm
).
Similarly, the form of a one-sided lower prediction interval is:
Note: For simultaneous prediction intervals, only lower
(pi.type="lower"
) and upper
(pi.type="upper"
) prediction
intervals are available.
The derivation of the constant predIntNormSimultaneousK
.
Prediction Intervals are Random Intervals
A prediction interval is a random interval; that is, the lower and/or
upper bounds are random variables computed based on sample statistics in the
baseline sample. Prior to taking one specific baseline sample, the probability
that the prediction interval will perform according to the rule chosen is
predIntNorm
for more information.
Barclay's California Code of Regulations. (1991). Title 22, Section 66264.97 [concerning hazardous waste facilities] and Title 23, Section 2550.7(e)(8) [concerning solid waste facilities]. Barclay's Law Publishers, San Francisco, CA.
Davis, C.B. (1998a). Ground-Water Statistics \& Regulations: Principles, Progress and Problems. Second Edition. Environmetrics \& Statistics Limited, Henderson, NV.
Davis, C.B. (1998b). Personal Communication, September 3, 1998.
Davis, C.B., and R.J. McNichols. (1987). One-sided Intervals for at Least
Fertig, K.W., and N.R. Mann. (1977). One-Sided Prediction Intervals for at Least
Gibbons, R.D., D.K. Bhaumik, and S. Aryal. (2009). Statistical Methods for Groundwater Monitoring, Second Edition. John Wiley & Sons, Hoboken.
Hahn, G.J. (1969). Factors for Calculating Two-Sided Prediction Intervals for Samples from a Normal Distribution. Journal of the American Statistical Association 64(327), 878-898.
Hahn, G.J. (1970a). Additional Factors for Calculating Prediction Intervals for Samples from a Normal Distribution. Journal of the American Statistical Association 65(332), 1668-1676.
Hahn, G.J. (1970b). Statistical Intervals for a Normal Population, Part I: Tables, Examples and Applications. Journal of Quality Technology 2(3), 115-125.
Hahn, G.J. (1970c). Statistical Intervals for a Normal Population, Part II: Formulas, Assumptions, Some Derivations. Journal of Quality Technology 2(4), 195-206.
Hahn, G.J., and W.Q. Meeker. (1991). Statistical Intervals: A Guide for Practitioners. John Wiley and Sons, New York.
Hahn, G., and W. Nelson. (1973). A Survey of Prediction Intervals and Their Applications. Journal of Quality Technology 5, 178-188.
Hall, I.J., and R.R. Prairie. (1973). One-Sided Prediction Intervals to Contain at
Least
Millard, S.P. (1987). Environmental Monitoring, Statistics, and the Law: Room for Improvement (with Comment). The American Statistician 41(4), 249--259.
Millard, S.P., and Neerchal, N.K. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton, Florida.
USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C.
USEPA. (2010). Errata Sheet - March 2009 Unified Guidance. EPA 530/R-09-007a, August 9, 2010. Office of Resource Conservation and Recovery, Program Information and Implementation Division. U.S. Environmental Protection Agency, Washington, D.C.
predIntNormSimultaneousK
,
predIntNormSimultaneousTestPower
,
predIntNorm
,
predIntLnormSimultaneous
, tolIntNorm
,
Normal, estimate.object
, enorm
# NOT RUN {
# Generate 8 observations from a normal distribution with parameters
# mean=10 and sd=2, then use predIntNormSimultaneous to estimate the
# mean and standard deviation of the true distribution and construct an
# upper 95% prediction interval to contain at least 1 out of the next
# 3 observations.
# (Note: the call to set.seed simply allows you to reproduce this example.)
set.seed(479)
dat <- rnorm(8, mean = 10, sd = 2)
predIntNormSimultaneous(dat, k = 1, m = 3)
#Results of Distribution Parameter Estimation
#--------------------------------------------
#
#Assumed Distribution: Normal
#
#Estimated Parameter(s): mean = 10.269773
# sd = 2.210246
#
#Estimation Method: mvue
#
#Data: dat
#
#Sample Size: 8
#
#Prediction Interval Method: exact
#
#Prediction Interval Type: upper
#
#Confidence Level: 95%
#
#Minimum Number of
#Future Observations
#Interval Should Contain: 1
#
#Total Number of
#Future Observations: 3
#
#Prediction Interval: LPL = -Inf
# UPL = 11.4021
#----------
# Repeat the above example, but do it in two steps. First create a list called
# est.list containing information about the estimated parameters, then create the
# prediction interval.
est.list <- enorm(dat)
est.list
#Results of Distribution Parameter Estimation
#--------------------------------------------
#
#Assumed Distribution: Normal
#
#Estimated Parameter(s): mean = 10.269773
# sd = 2.210246
#
#Estimation Method: mvue
#
#Data: dat
#
#Sample Size: 8
predIntNormSimultaneous(est.list, k = 1, m = 3)
#Results of Distribution Parameter Estimation
#--------------------------------------------
#
#Assumed Distribution: Normal
#
#Estimated Parameter(s): mean = 10.269773
# sd = 2.210246
#
#Estimation Method: mvue
#
#Data: dat
#
#Sample Size: 8
#
#Prediction Interval Method: exact
#
#Prediction Interval Type: upper
#
#Confidence Level: 95%
#
#Minimum Number of
#Future Observations
#Interval Should Contain: 1
#
#Total Number of
#Future Observations: 3
#
#Prediction Interval: LPL = -Inf
# UPL = 11.4021
#----------
# Compare the 95% 1-of-3 upper prediction interval to the California and
# Modified California prediction intervals. Note that the upper prediction
# bound for the Modified California rule is between the bound for the
# 1-of-3 rule bound and the bound for the California rule.
predIntNormSimultaneous(dat, k = 1, m = 3)$interval$limits["UPL"]
# UPL
#11.4021
predIntNormSimultaneous(dat, m = 3, rule = "CA")$interval$limits["UPL"]
# UPL
#13.03717
predIntNormSimultaneous(dat, rule = "Modified.CA")$interval$limits["UPL"]
# UPL
#12.12201
#----------
# Show how the upper bound on an upper 95% simultaneous prediction limit increases
# as the number of future sampling occasions r increases. Here, we'll use the
# 1-of-3 rule.
predIntNormSimultaneous(dat, k = 1, m = 3)$interval$limits["UPL"]
# UPL
#11.4021
predIntNormSimultaneous(dat, k = 1, m = 3, r = 10)$interval$limits["UPL"]
# UPL
#13.28234
#----------
# Compare the upper simultaneous prediction limit for the 1-of-3 rule
# based on individual observations versus based on means of order 4.
predIntNormSimultaneous(dat, k = 1, m = 3)$interval$limits["UPL"]
# UPL
#11.4021
predIntNormSimultaneous(dat, n.mean = 4, k = 1,
m = 3)$interval$limits["UPL"]
# UPL
#11.26157
#==========
# Example 19-1 of USEPA (2009, p. 19-17) shows how to compute an
# upper simultaneous prediction limit for the 1-of-3 rule for
# r = 2 future sampling occasions. The data for this example are
# stored in EPA.09.Ex.19.1.sulfate.df.
# We will pool data from 4 background wells that were sampled on
# a number of different occasions, giving us a sample size of
# n = 25 to use to construct the prediction limit.
# There are 50 compliance wells and we will monitor 10 different
# constituents at each well at each of the r=2 future sampling
# occasions. To determine the confidence level we require for
# the simultaneous prediction interval, USEPA (2009) recommends
# setting the individual Type I Error level at each well to
# 1 - (1 - SWFPR)^(1 / (Number of Constituents * Number of Wells))
# which translates to setting the confidence limit to
# (1 - SWFPR)^(1 / (Number of Constituents * Number of Wells))
# where SWFPR = site-wide false positive rate. For this example, we
# will set SWFPR = 0.1. Thus, the confidence level is given by:
nc <- 10
nw <- 50
SWFPR <- 0.1
conf.level <- (1 - SWFPR)^(1 / (nc * nw))
conf.level
#[1] 0.9997893
#----------
# Look at the data:
names(EPA.09.Ex.19.1.sulfate.df)
#[1] "Well" "Month" "Day"
#[4] "Year" "Date" "Sulfate.mg.per.l"
#[7] "log.Sulfate.mg.per.l"
EPA.09.Ex.19.1.sulfate.df[,
c("Well", "Date", "Sulfate.mg.per.l", "log.Sulfate.mg.per.l")]
# Well Date Sulfate.mg.per.l log.Sulfate.mg.per.l
#1 GW-01 1999-07-08 63.0 4.143135
#2 GW-01 1999-09-12 51.0 3.931826
#3 GW-01 1999-10-16 60.0 4.094345
#4 GW-01 1999-11-02 86.0 4.454347
#5 GW-04 1999-07-09 104.0 4.644391
#6 GW-04 1999-09-14 102.0 4.624973
#7 GW-04 1999-10-12 84.0 4.430817
#8 GW-04 1999-11-15 72.0 4.276666
#9 GW-08 1997-10-12 31.0 3.433987
#10 GW-08 1997-11-16 84.0 4.430817
#11 GW-08 1998-01-28 65.0 4.174387
#12 GW-08 1999-04-20 41.0 3.713572
#13 GW-08 2002-06-04 51.8 3.947390
#14 GW-08 2002-09-16 57.5 4.051785
#15 GW-08 2002-12-02 66.8 4.201703
#16 GW-08 2003-03-24 87.1 4.467057
#17 GW-09 1997-10-16 59.0 4.077537
#18 GW-09 1998-01-28 85.0 4.442651
#19 GW-09 1998-04-12 75.0 4.317488
#20 GW-09 1998-07-12 99.0 4.595120
#21 GW-09 2000-01-30 75.8 4.328098
#22 GW-09 2000-04-24 82.5 4.412798
#23 GW-09 2000-10-24 85.5 4.448516
#24 GW-09 2002-12-01 188.0 5.236442
#25 GW-09 2003-03-24 150.0 5.010635
# Construct the upper simultaneous prediction limit for the
# 1-of-3 plan based on the log-transformed sulfate data
log.Sulfate <- EPA.09.Ex.19.1.sulfate.df$log.Sulfate.mg.per.l
pred.int.list.log <-
predIntNormSimultaneous(x = log.Sulfate, k = 1, m = 3, r = 2,
rule = "k.of.m", pi.type = "upper", conf.level = conf.level)
pred.int.list.log
#Results of Distribution Parameter Estimation
#--------------------------------------------
#
#Assumed Distribution: Normal
#
#Estimated Parameter(s): mean = 4.3156194
# sd = 0.3756697
#
#Estimation Method: mvue
#
#Data: log.Sulfate
#
#Sample Size: 25
#
#Prediction Interval Method: exact
#
#Prediction Interval Type: upper
#
#Confidence Level: 99.97893%
#
#Minimum Number of
#Future Observations
#Interval Should Contain
#(per Sampling Occasion): 1
#
#Total Number of
#Future Observations
#(per Sampling Occasion): 3
#
#Number of Future
#Sampling Occasions: 2
#
#Prediction Interval: LPL = -Inf
# UPL = 5.072355
# Now exponentiate the prediction interval to get the limit on
# the original scale
exp(pred.int.list.log$interval$limits["UPL"])
# UPL
#159.5497
#==========
# Cleanup
#--------
rm(dat, est.list, nc, nw, SWFPR, conf.level, log.Sulfate,
pred.int.list.log)
# }
Run the code above in your browser using DataLab