eqnormCensored: Estimate Quantiles of a Normal Distribution Based on Type I Censored Data

Description

Estimate quantiles of a normal distribution given a sample of data that has been subjected to Type I censoring, and optionally construct a confidence interval for a quantile.

Usage

eqnormCensored(x, censored, censoring.side = "left", p = 0.5, method = "mle", 
    ci = FALSE, ci.method = "exact.for.complete", ci.type = "two-sided", 
    conf.level = 0.95, digits = 0, nmc = 1000, seed = NULL)

Arguments

a numeric vector of observations. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed.

censored

numeric or logical vector indicating which values of x are censored. This must be the same length as x. If the mode of censored is "logical", TRUE values correspond to elements of

censoring.side

character string indicating on which side the censoring occurs. The possible values are "left" (the default) and "right".

numeric vector of probabilities for which quantiles will be estimated. All values of p must be between 0 and 1. When ci=TRUE, p must be a scalar. The default value is p=0.5.

method

character string specifying the method of estimating the mean and standard deviation. For singly censored data, the possible values are: "mle" (maximum likelihood; the default), "bcmle" (bias-corrected maximum lik

logical scalar indicating whether to compute a confidence interval for the quantile. The default value is ci=FALSE.

ci.method

character string indicating what method to use to construct the confidence interval for the quantile. The possible values are "exact.for.complete" (exact method for complete (uncensored) data; the default), "gpq" (me

ci.type

character string indicating what kind of confidence interval for the quantile to compute. The possible values are "two-sided" (the default), "lower", and "upper". This argument is ignored if ci=FALSE<

conf.level

a scalar between 0 and 1 indicating the confidence level of the confidence interval. The default value is conf.level=0.95. This argument is ignored if ci=FALSE.

digits

an integer indicating the number of decimal places to round to when printing out the value of 100*p. The default value is digits=0.

nmc

numeric scalar indicating the number of Monte Carlo simulations to run when ci.method="gpq". The default is nmc=1000. This argument is ignored if ci=FALSE.

seed

integer supplied to the function set.seed and used when ci.method="gpq". The default value is seed=NULL, in which case the current value of .Random.seed

Value

eqnormCensored returns a list of class "estimateCensored" containing the estimated quantile(s) and other information. See estimateCensored.object for details.

Details

Estimating Quantiles Quantiles are estimated by:

estimating the mean and standard deviation parameters by callingenormCensored, and then
calling the functionqnormand using the estimated values for the mean and standard deviation.

The estimated quantile thus depends on the method of estimating the mean and standard deviation. Confidence Intervals for Quantiles Exact Method When Data are Complete (ci.method="exact.for.complete") When ci.method="exact.for.complete", the function eqnormCensored calls the function eqnorm, supplying it with the estimated mean and standard deviation, and setting the argument ci.method="exact". Thus, this is the exact method for computing a confidence interval for a quantile had the data been complete. Because the data have been subjected to Type I censoring, this method of constructing a confidence interval for the quantile is an approximation. Normal Approximation (ci.method="normal.approx") When ci.method="normal.approx", the function eqnormCensored calls the function eqnorm, supplying it with the estimated mean and standard deviation, and setting the argument ci.method="normal.approx". Thus, this is the normal approximation method for computing a confidence interval for a quantile had the data been complete. Because the data have been subjected to Type I censoring, this method of constructing a confidence interval for the quantile is an approximation both because of the normal approximation and because the estimates of the mean and standard devation are based on censored, instead of complete, data. Generalized Pivotal Quantity (ci.method="gpq") When ci.method="gpq", the function eqnormCensored uses the relationship between confidence intervals for quantiles and tolerance intervals and calls the function tolIntNormCensored with the argument ti.method="gpq" to construct the confidence interval. Specifically, it can be shown (e.g., Conover, 1980, pp.119-121) that an upper confidence interval for the $p$'th quantile with confidence level $100(1-\alpha)%$ is equivalent to an upper $\beta$-content tolerance interval with coverage $100p%$ and confidence level $100(1-\alpha)%$. Also, a lower confidence interval for the $p$'th quantile with confidence level $100(1-\alpha)%$ is equivalent to a lower $\beta$-content tolerance interval with coverage $100(1-p)%$ and confidence level $100(1-\alpha)%$.

References

Berthouex, P.M., and L.C. Brown. (2002). Statistics for Environmental Engineers. Lewis Publishers, Boca Raton. Caudill, S.P., L.-Y. Wong, W.E. Turner, R. Lee, A. Henderson, D. G. Patterson Jr. (2007). Percentile Estimation Using Variable Censored Data. Chemosphere 68, 169--180. Conover, W.J. (1980). Practical Nonparametric Statistics. Second Edition. John Wiley and Sons, New York. Draper, N., and H. Smith. (1998). Applied Regression Analysis. Third Edition. John Wiley and Sons, New York. Ellison, B.E. (1964). On Two-Sided Tolerance Intervals for a Normal Distribution. Annals of Mathematical Statistics 35, 762-772. Gibbons, R.D., D.K. Bhaumik, and S. Aryal. (2009). Statistical Methods for Groundwater Monitoring, Second Edition. John Wiley & Sons, Hoboken. Gilbert, R.O. (1987). Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold, New York, NY, pp.132-136. Guttman, I. (1970). Statistical Tolerance Regions: Classical and Bayesian. Hafner Publishing Co., Darien, CT. Hahn, G.J. (1970b). Statistical Intervals for a Normal Population, Part I: Tables, Examples and Applications. Journal of Quality Technology 2(3), 115-125. Hahn, G.J. (1970c). Statistical Intervals for a Normal Population, Part II: Formulas, Assumptions, Some Derivations. Journal of Quality Technology 2(4), 195-206. Hahn, G.J., and W.Q. Meeker. (1991). Statistical Intervals: A Guide for Practitioners. John Wiley and Sons, New York. Helsel, D.R., and R.M. Hirsch. (1992). Statistical Methods in Water Resources Research. Elsevier, New York, NY, pp.88-90. Hewett, P., and G.H. Ganser. (2007). A Comparison of Several Methods for Analyzing Censored Data. Annals of Occupational Hygiene 51(7), 611--632. Johnson, N.L., and B.L. Welch. (1940). Applications of the Non-Central t-Distribution. Biometrika 31, 362-389. Krishnamoorthy K., and T. Mathew. (2009). Statistical Tolerance Regions: Theory, Applications, and Computation. John Wiley and Sons, Hoboken. Kroll, C.N., and J.R. Stedinger. (1996). Estimation of Moments and Quantiles Using Censored Data. Water Resources Research 32(4), 1005--1012. Millard, S.P., and N.K. Neerchal. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton. Odeh, R.E., and D.B. Owen. (1980). Tables for Normal Tolerance Limits, Sampling Plans, and Screening. Marcel Dekker, New York. Owen, D.B. (1962). Handbook of Statistical Tables. Addison-Wesley, Reading, MA. Serasinghe, S.K. (2010). A Simulation Comparison of Parametric and Nonparametric Estimators of Quantiles from Right Censored Data. A Report submitted in partial fulfillment of the requirements for the degree Master of Science, Department of Statistics, College of Arts and Sciences, Kansas State University, Manhattan, Kansas. Singh, A., R. Maichle, and S. Lee. (2006). On the Computation of a 95% Upper Confidence Limit of the Unknown Population Mean Based Upon Data Sets with Below Detection Limit Observations. EPA/600/R-06/022, March 2006. Office of Research and Development, U.S. Environmental Protection Agency, Washington, D.C. Singh, A., R. Maichle, and N. Armbya. (2010a). ProUCL Version 4.1.00 User Guide (Draft). EPA/600/R-07/041, May 2010. Office of Research and Development, U.S. Environmental Protection Agency, Washington, D.C. Singh, A., N. Armbya, and A. Singh. (2010b). ProUCL Version 4.1.00 Technical Guide (Draft). EPA/600/R-07/041, May 2010. Office of Research and Development, U.S. Environmental Protection Agency, Washington, D.C. Stedinger, J. (1983). Confidence Intervals for Design Events. Journal of Hydraulic Engineering 109(1), 13-27. Stedinger, J.R., R.M. Vogel, and E. Foufoula-Georgiou. (1993). Frequency Analysis of Extreme Events. In: Maidment, D.R., ed. Handbook of Hydrology. McGraw-Hill, New York, Chapter 18, pp.29-30. USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. USEPA. (2010). Errata Sheet - March 2009 Unified Guidance. EPA 530/R-09-007a, August 9, 2010. Office of Resource Conservation and Recovery, Program Information and Implementation Division. U.S. Environmental Protection Agency, Washington, D.C. Wald, A., and J. Wolfowitz. (1946). Tolerance Limits for a Normal Distribution. Annals of Mathematical Statistics 17, 208-215.

Examples

Run this code

# Generate 15 observations from a normal distribution with 
  # parameters mean=10 and sd=2, and censor observations less than 8. 
  # Then generate 15 more observations from this distribution and  censor 
  # observations less than 7.
  # Then estimate the 90th percentile and create a one-sided upper 95% 
  # confidence interval for that percentile. 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(47) 

  x.1 <- rnorm(15, mean = 10, sd = 2) 
  sort(x.1)
  # [1]  6.343542  7.068499  7.828525  8.029036  8.155088  9.436470
  # [7]  9.495908 10.030262 10.079205 10.182946 10.217551 10.370811
  #[13] 10.987640 11.422285 13.989393
  censored.1 <- x.1 < 8
  x.1[censored.1] <- 8

  x.2 <- rnorm(15, mean = 10, sd = 2) 
  sort(x.2)
  # [1]  5.355255  6.065562  6.783680  6.867676  8.219412  8.593224
  # [7]  9.319168  9.347066  9.837844  9.918844 10.055054 10.498296
  #[13] 10.834382 11.341558 12.528482
  censored.2 <- x.2 < 7
  x.2[censored.2] <- 7

  x <- c(x.1, x.2)
  censored <- c(censored.1, censored.2)

  eqnormCensored(x, censored, p = 0.9, ci = TRUE, ci.type = "upper")

  #Results of Distribution Parameter Estimation
  #Based on Type I Censored Data
  #--------------------------------------------
  #
  #Assumed Distribution:            Normal
  #
  #Censoring Side:                  left
  #
  #Censoring Level(s):              7 8 
  #
  #Estimated Parameter(s):          mean = 9.390624
  #                                 sd   = 1.827156
  #
  #Estimation Method:               MLE
  #
  #Estimated Quantile(s):           90'th %ile = 11.73222
  #
  #Quantile Estimation Method:      Quantile(s) Based on
  #                                 MLE Estimators
  #
  #Data:                            x
  #
  #Censoring Variable:              censored
  #  
  #Sample Size:                     30
  #
  #Percent Censored:                16.66667%
  #
  #Confidence Interval for:         90'th %ile
  #
  #Assumed Sample Size:             30
  #
  #Confidence Interval Method:      Exact for
  #                                 Complete Data
  #
  #Confidence Interval Type:        upper
  #
  #Confidence Level:                95%
  #
  #Confidence Interval:             LCL =     -Inf
  #                                 UCL = 12.63808

  #----------

  # Compare these results with the true 90'th percentile:

  qnorm(p = 0.9, mean = 10, sd = 2)
  #[1] 12.56310

  #----------

  # Clean up
  rm(x.1, censored.1, x.2, censored.2, x, censored)
  
  #==========

  # Chapter 15 of USEPA (2009) gives several examples of estimating the mean
  # and standard deviation of a lognormal distribution on the log-scale using 
  # manganese concentrations (ppb) in groundwater at five background wells. 
  # In EnvStats these data are stored in the data frame 
  # EPA.09.Ex.15.1.manganese.df.

  # Here we will estimate the mean and standard deviation using the MLE, 
  # and then construct an upper 95% confidence limit for the 90th percentile. 

  # We will log-transform the original observations and then call 
  # eqnormCensored.  Alternatively, we could have more simply called 
  # eqlnormCensored.

  # First look at the data:
  #-----------------------

  EPA.09.Ex.15.1.manganese.df

  #   Sample   Well Manganese.Orig.ppb Manganese.ppb Censored
  #1       1 Well.1                 <5           5.0     TRUE
  #2       2 Well.1               12.1          12.1    FALSE
  #3       3 Well.1               16.9          16.9    FALSE
  #...
  #23      3 Well.5                3.3           3.3    FALSE
  #24      4 Well.5                8.4           8.4    FALSE
  #25      5 Well.5                 <2           2.0     TRUE

  longToWide(EPA.09.Ex.15.1.manganese.df, 
    "Manganese.Orig.ppb", "Sample", "Well",
    paste.row.name = TRUE)  

  #         Well.1 Well.2 Well.3 Well.4 Well.5
  #Sample.1     <5     <5     <5    6.3   17.9
  #Sample.2   12.1    7.7    5.3   11.9   22.7
  #Sample.3   16.9   53.6   12.6     10    3.3
  #Sample.4   21.6    9.5  106.3     <2    8.4
  #Sample.5     <2   45.9   34.5   77.2     <2


  # Now estimate the mean, standard deviation, and 90th percentile 
  # on the log-scale using the MLE, and construct an upper 95% 
  # confidence limit for the 90th percentile on the log-scale:
  #---------------------------------------------------------------

  est.list <- with(EPA.09.Ex.15.1.manganese.df,
    eqnormCensored(log(Manganese.ppb), Censored, 
      p = 0.9, ci = TRUE, ci.type = "upper"))

  est.list

  #Results of Distribution Parameter Estimation
  #Based on Type I Censored Data
  #--------------------------------------------
  #
  #Assumed Distribution:            Normal
  #
  #Censoring Side:                  left
  #
  #Censoring Level(s):              0.6931472 1.6094379 
  #
  #Estimated Parameter(s):          mean = 2.215905
  #                                 sd   = 1.356291
  #
  #Estimation Method:               MLE
  #
  #Estimated Quantile(s):           90'th %ile = 3.954062
  #
  #Quantile Estimation Method:      Quantile(s) Based on
  #                                 MLE Estimators
  #
  #Data:                            log(Manganese.ppb)
  #
  #Censoring Variable:              censored
  #
  #Sample Size:                     25
  #
  #Percent Censored:                24%
  #
  #Confidence Interval for:         90'th %ile
  #
  #Assumed Sample Size:             25
  #
  #Confidence Interval Method:      Exact for
  #                                 Complete Data
  #
  #Confidence Interval Type:        upper
  #
  #Confidence Level:                95%
  #
  #Confidence Interval:             LCL =     -Inf
  #                                 UCL = 4.708904

  # To estimate the 90th percentile on the original scale, 
  # we need to exponentiate the results
  #-------------------------------------------------------
  exp(est.list$quantiles)
  #90'th %ile 
  #  52.14674 

  exp(est.list$interval$limits)
  #     LCL      UCL 
  #  0.0000 110.9305

  #----------

  # Clean up
  #---------
  rm(est.list)

Run the code above in your browser using DataLab