epoisCensored: Estimate Mean of a Poisson Distribution Based on Type I Censored Data

Description

Estimate the mean of a Poisson distribution given a sample of data that has been subjected to Type I censoring, and optionally construct a confidence interval for the mean.

Usage

epoisCensored(x, censored, method = "mle", censoring.side = "left", 
    ci = FALSE, ci.method = "profile.likelihood", ci.type = "two-sided", 
    conf.level = 0.95, n.bootstraps = 1000, pivot.statistic = "z", 
    ci.sample.size = sum(!censored))

Arguments

numeric vector of observations. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed.

censored

numeric or logical vector indicating which values of x are censored. This must be the same length as x. If the mode of censored is "logical", TRUE values correspond to elements of

method

character string specifying the method of estimation. The possible values are: "mle" (maximum likelihood; the default), and "half.cen.level" (moment estimation based on setting the censored observations to half the

censoring.side

character string indicating on which side the censoring occurs. The possible values are "left" (the default) and "right".

logical scalar indicating whether to compute a confidence interval for the mean or variance. The default value is ci=FALSE.

ci.method

character string indicating what method to use to construct the confidence interval for the mean. The possible values are "profile.likelihood" (profile likelihood; the default), "normal.approx" (normal approximation)

ci.type

character string indicating what kind of confidence interval to compute. The possible values are "two-sided" (the default), "lower", and "upper". This argument is ignored if ci=FALSE.

conf.level

a scalar between 0 and 1 indicating the confidence level of the confidence interval. The default value is conf.level=0.95. This argument is ignored if ci=FALSE.

n.bootstraps

numeric scalar indicating how many bootstraps to use to construct the confidence interval for the mean when ci.type="bootstrap". This argument is ignored if ci=FALSE and/or ci.method does not equal

pivot.statistic

character string indicating which pivot statistic to use in the construction of the confidence interval for the mean when ci.method="normal.approx" (see the DETAILS section). The possible values are pivot.statistic="z"

ci.sample.size

numeric scalar indicating what sample size to assume to construct the confidence interval for the mean if pivot.statistic="t" and ci.method="normal.approx". The default value is the number of uncensored observations.

Value

a list of class "estimateCensored" containing the estimated parameters and other information. See estimateCensored.object for details.

Details

If x or censored contain any missing (NA), undefined (NaN) or infinite (Inf, -Inf) values, they will be removed prior to performing the estimation. Let $\underline{x}$ denote a vector of $N$ observations from a Poisson distribution with mean lambda=$\lambda$. Assume $n$ ($0 < n < N$) of these observations are known and $c$ ($c=N-n$) of these observations are all censored below (left-censored) or all censored above (right-censored) at $k$ fixed censoring levels $$T_1, T_2, \ldots, T_k; \; k \ge 1 \;\;\;\;\;\; (1)$$ For the case when $k \ge 2$, the data are said to be Type I multiply censored. For the case when $k=1$, set $T = T_1$. If the data are left-censored and all $n$ known observations are greater than or equal to $T$, or if the data are right-censored and all $n$ known observations are less than or equal to $T$, then the data are said to be Type I singly censored (Nelson, 1982, p.7), otherwise they are considered to be Type I multiply censored. Let $c_j$ denote the number of observations censored below or above censoring level $T_j$ for $j = 1, 2, \ldots, k$, so that $$\sum_{i=1}^k c_j = c \;\;\;\;\;\; (2)$$ Let $x_{(1)}, x_{(2)}, \ldots, x_{(N)}$ denote the ordered observations, where now observation means either the actual observation (for uncensored observations) or the censoring level (for censored observations). For right-censored data, if a censored observation has the same value as an uncensored one, the uncensored observation should be placed first. For left-censored data, if a censored observation has the same value as an uncensored one, the censored observation should be placed first. Note that in this case the quantity $x_{(i)}$ does not necessarily represent the $i$'th largest observation from the (unknown) complete sample. Finally, let $\Omega$ (omega) denote the set of $n$ subscripts in the ordered sample that correspond to uncensored observations. Estimation Maximum Likelihood Estimation (method="mle") For Type I left censored data, the likelihood function is given by: $$L(\lambda | \underline{x}) = {N \choose c_1 c_2 \ldots c_k n} \prod_{j=1}^k [F(T_j)]^{c_j} \prod_{i \in \Omega} f[x_{(i)}] \;\;\;\;\;\; (3)$$ where $f$ and $F$ denote the probability density function (pdf) and cumulative distribution function (cdf) of the population (Cohen, 1963; Cohen, 1991, pp.6, 50). That is, $$f(t) = \frac{e^{-\lambda} \lambda^t}{t!}, \; x = 0, 1, 2, \ldots \;\;\;\;\;\; (4)$$ $$F(t) = \sum_{i=0}^t f(i) = \sum_{i=0}^t \frac{e^{-\lambda} \lambda^i}{i!} \;\;\;\;\;\; (5)$$ (Johnson et al., 1992, p.151). For left singly censored data, equation (3) simplifies to: $$L(\lambda | \underline{x}) = {N \choose c} [F(T)]^{c} \prod_{i = c+1}^n f[x_{(i)}] \;\;\;\;\;\; (6)$$ Similarly, for Type I right censored data, the likelihood function is given by: $$L(\lambda | \underline{x}) = {N \choose c_1 c_2 \ldots c_k n} \prod_{j=1}^k [1 - F(T_j)]^{c_j} \prod_{i \in \Omega} f[x_{(i)}] \;\;\;\;\;\; (7)$$ and for right singly censored data this simplifies to: $$L(\lambda | \underline{x}) = {N \choose c} [1 - F(T)]^{c} \prod_{i = 1}^n f[x_{(i)}] \;\;\;\;\;\; (8)$$ The maximum likelihood estimators are computed by maximizing the likelihood function. For right-censored data, taking the derivative of the log-likelihood function with respect to $\lambda$ and setting this to 0 produces the following equation: $$\bar{x} = \lambda {1 - \sum_{j=1}^K \frac{c_j}{n} [\frac{f(T_j)}{1-F(T_j)}] } \;\;\;\;\;\; (9)$$ where $$\bar{x} = \frac{1}{n} \sum{i \in \Omega} x_i \;\;\;\;\;\; (10)$$ Note that the quantity defined in equation (10) is simply the mean of the uncensored observations. For left-censored data, taking the derivative of the log-likelihood function with respect to $\lambda$ and setting this to 0 produces the following equation: $$\bar{x} = \lambda {1 + \sum_{j=1}^K \frac{c_j}{n} [\frac{f(T_j - 1)}{F(T_j - 1)}] } \;\;\;\;\;\; (11)$$ The function epoisCensored computes the maximum likelihood estimator of $\lambda$ by solving Equation (9) (right-censored data) or Equation (11) (left-censored data); it uses the sample mean of the uncensored observations as the initial value. Setting Censored Observations to Half the Censoring Level (method="half.cen.level") This method is applicable only to left censored data. This method involves simply replacing all the censored observations with half their detection limit, and then computing the mean and standard deviation with the usual formulas (see epois). This method is included only to allow comparison of this method to other methods. Setting left-censored observations to half the censoring level is not recommended. Confidence Intervals This section explains how confidence intervals for the mean $\lambda$ are computed. Likelihood Profile (ci.method="profile.likelihood") This method was proposed by Cox (1970, p.88), and Venzon and Moolgavkar (1988) introduced an efficient method of computation. This method is also discussed by Stryhn and Christensen (2003) and Royston (2007). The idea behind this method is to invert the likelihood-ratio test to obtain a confidence interval for the mean $\lambda$. Equation (3) above shows the form of the likelihood function $L(\lambda | \underline{x})$ for multiply left-censored data, and Equation (7) shows the function for multiply right-censored data. Following Stryhn and Christensen (2003), denote the maximum likelihood estimate of the mean by $\lambda^*$. The likelihood ratio test statistic ($G^2$) of the hypothesis $H_0: \lambda = \lambda_0$ (where $\lambda_0$ is a fixed value) equals the drop in $2 log(L)$ between the full model and the reduced model with $\lambda$ fixed at $\mu_0$, i.e., $$G^2 = 2 {log[L(\lambda^*)] - log[L(\lambda_0)]} \;\;\;\;\;\; (11)$$. Under the null hypothesis, the test statistic $G^2$ follows a chi-squared distribution with 1 degree of freedom. A two-sided $(1-\alpha)100%$ confidence interval for the mean $\lambda$ consists of all values of $\lambda_0$ for which the test is not significant at level $alpha$: $$\lambda_0: G^2 \le \chi^2_{1, {1-\alpha}} \;\;\;\;\;\; (12)$$ where $\chi^2_{\nu, p}$ denotes the $p$'th quantile of the chi-squared distribution with $\nu$ degrees of freedom. One-sided lower and one-sided upper confidence intervals are computed in a similar fashion, except that the quantity $1-\alpha$ in Equation (12) is replaced with $1-2\alpha$. Normal Approximation (ci.method="normal.approx") This method constructs approximate $(1-\alpha)100%$ confidence intervals for $\lambda$ based on the assumption that the estimator of $\lambda$ is approximately normally distributed. That is, a two-sided $(1-\alpha)100%$ confidence interval for $\lambda$ is constructed as: $$[\hat{\lambda} - t_{1-\alpha/2, m-1}\hat{\sigma}_{\hat{\lambda}}, \; \hat{\lambda} + t_{1-\alpha/2, m-1}\hat{\sigma}_{\hat{\lambda}}] \;\;\;\; (13)$$ where $\hat{\lambda}$ denotes the estimate of $\lambda$, $\hat{\sigma}_{\hat{\lambda}}$ denotes the estimated asymptotic standard deviation of the estimator of $\lambda$, $m$ denotes the assumed sample size for the confidence interval, and $t_{p,\nu}$ denotes the $p$'th quantile of Student's t-distribuiton with $\nu$ degrees of freedom. One-sided confidence intervals are computed in a similar fashion. The argument ci.sample.size determines the value of $m$ and by default is equal to the number of uncensored observations. This is simply an ad-hoc method of constructing confidence intervals and is not based on any published theoretical results. When pivot.statistic="z", the $p$'th quantile from the standard normal distribution is used in place of the $p$'th quantile from Student's t-distribution. When $\lambda$ is estimated with the maximum likelihood estimator (method="mle"), the variance of $\hat{\lambda}$ is estimated based on the inverse of the Fisher Information matrix. When $\lambda$ is estimated using the half-censoring-level method (method="half.cen.level"), the variance of $\hat{\lambda}$ is estimated as: $$\hat{\sigma}^2_{\hat{\lambda}} = \frac{\hat{\lambda}}{m} \;\;\;\;\;\; (14)$$ where $m$ denotes the assumed sample size (see above). Bootstrap and Bias-Corrected Bootstrap Approximation (ci.method="bootstrap") The bootstrap is a nonparametric method of estimating the distribution (and associated distribution parameters and quantiles) of a sample statistic, regardless of the distribution of the population from which the sample was drawn. The bootstrap was introduced by Efron (1979) and a general reference is Efron and Tibshirani (1993). In the context of deriving an approximate $(1-\alpha)100%$ confidence interval for the population mean $\lambda$, the bootstrap can be broken down into the following steps:

Create a bootstrap sample by taking a random sample of size$N$from the observations in$\underline{x}$, where sampling is done with replacement. Note that because sampling is done with replacement, the same element of$\underline{x}$can appear more than once in the bootstrap sample. Thus, the bootstrap sample will usually not look exactly like the original sample (e.g., the number of censored observations in the bootstrap sample will often differ from the number of censored observations in the original sample).
Estimate$\lambda$based on the bootstrap sample created in Step 1, using the same method that was used to estimate$\lambda$using the original observations in$\underline{x}$. Because the bootstrap sample usually does not match the original sample, the estimate of$\lambda$based on the bootstrap sample will usually differ from the original estimate based on$\underline{x}$.
Repeat Steps 1 and 2$B$times, where$B$is some large number. For the functionepoisCensored, the number of bootstraps$B$is determined by the argumentn.bootstraps(see the section ARGUMENTS above). The default value ofn.bootstrapsis1000.
Use the$B$estimated values of$\lambda$to compute the empirical cumulative distribution function of this estimator of$\lambda$(seeecdfPlot), and then create a confidence interval for$\lambda$based on this estimated cdf.

The two-sided percentile interval (Efron and Tibshirani, 1993, p.170) is computed as: $$[\hat{G}^{-1}(\frac{\alpha}{2}), \; \hat{G}^{-1}(1-\frac{\alpha}{2})] \;\;\;\;\;\; (15)$$ where $\hat{G}(t)$ denotes the empirical cdf evaluated at $t$ and thus $\hat{G}^{-1}(p)$ denotes the $p$'th empirical quantile, that is, the $p$'th quantile associated with the empirical cdf. Similarly, a one-sided lower confidence interval is computed as: $$[\hat{G}^{-1}(\alpha), \; \infty] \;\;\;\;\;\; (16)$$ and a one-sided upper confidence interval is computed as: $$[0, \; \hat{G}^{-1}(1-\alpha)] \;\;\;\;\;\; (17)$$ The function epoisCensored calls the Rfunction quantile to compute the empirical quantiles used in Equations (15)-(17). The percentile method bootstrap confidence interval is only first-order accurate (Efron and Tibshirani, 1993, pp.187-188), meaning that the probability that the confidence interval will contain the true value of $\lambda$ can be off by $k/\sqrt{N}$, where $k$is some constant. Efron and Tibshirani (1993, pp.184-188) proposed a bias-corrected and accelerated interval that is second-order accurate, meaning that the probability that the confidence interval will contain the true value of $\lambda$ may be off by $k/N$ instead of $k/\sqrt{N}$. The two-sided bias-corrected and accelerated confidence interval is computed as: $$[\hat{G}^{-1}(\alpha_1), \; \hat{G}^{-1}(\alpha_2)] \;\;\;\;\;\; (18)$$ where $$\alpha_1 = \Phi[\hat{z}_0 + \frac{\hat{z}_0 + z_{\alpha/2}}{1 - \hat{a}(z_0 + z_{\alpha/2})}] \;\;\;\;\;\; (19)$$ $$\alpha_2 = \Phi[\hat{z}_0 + \frac{\hat{z}_0 + z_{1-\alpha/2}}{1 - \hat{a}(z_0 + z_{1-\alpha/2})}] \;\;\;\;\;\; (20)$$ $$\hat{z}_0 = \Phi^{-1}[\hat{G}(\hat{\lambda})] \;\;\;\;\;\; (21)$$ $$\hat{a} = \frac{\sum_{i=1}^N (\hat{\lambda}_{(\cdot)} - \hat{\lambda}_{(i)})^3}{6[\sum_{i=1}^N (\hat{\lambda}_{(\cdot)} - \hat{\lambda}_{(i)})^2]^{3/2}} \;\;\;\;\;\; (22)$$ where the quantity $\hat{\lambda}_{(i)}$ denotes the estimate of $\lambda$ using all the values in $\underline{x}$ except the $i$'th one, and $$\hat{\lambda}{(\cdot)} = \frac{1}{N} \sum_{i=1}^N \hat{\lambda_{(i)}} \;\;\;\;\;\; (23)$$ A one-sided lower confidence interval is given by: $$[\hat{G}^{-1}(\alpha_1), \; \infty] \;\;\;\;\;\; (24)$$ and a one-sided upper confidence interval is given by: $$[0, \; \hat{G}^{-1}(\alpha_2)] \;\;\;\;\;\; (25)$$ where $\alpha_1$ and $\alpha_2$ are computed as for a two-sided confidence interval, except $\alpha/2$ is replaced with $\alpha$ in Equations (19) and (20). The constant $\hat{z}_0$ incorporates the bias correction, and the constant $\hat{a}$ is the acceleration constant. The term acceleration refers to the rate of change of the standard error of the estimate of $\lambda$ with respect to the true value of $\lambda$ (Efron and Tibshirani, 1993, p.186). For a normal (Gaussian) distribution, the standard error of the estimate of $\lambda$ does not depend on the value of $\lambda$, hence the acceleration constant is not really necessary. When ci.method="bootstrap", the function epoisCensored computes both the percentile method and bias-corrected and accelerated method bootstrap confidence intervals.

References

Cohen, A.C. (1991). Truncated and Censored Samples. Marcel Dekker, New York, New York, 312pp. Cox, D.R. (1970). Analysis of Binary Data. Chapman & Hall, London. 142pp. Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics 7, 1--26. Efron, B., and R.J. Tibshirani. (1993). An Introduction to the Bootstrap. Chapman and Hall, New York, 436pp. Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions, Fourth Edition. John Wiley and Sons, Hoboken, NJ. Helsel, D.R. (2012). Statistics for Censored Environmental Data Using Minitab and R, Second Edition. John Wiley & Sons, Hoboken, New Jersey. Johnson, N. L., S. Kotz, and A. Kemp. (1992). Univariate Discrete Distributions, Second Edition. John Wiley and Sons, New York, Chapter 4. Millard, S.P., P. Dixon, and N.K. Neerchal. (2014; in preparation). Environmental Statistics with R. CRC Press, Boca Raton, Florida. Nelson, W. (1982). Applied Life Data Analysis. John Wiley and Sons, New York, 634pp. Royston, P. (2007). Profile Likelihood for Estimation and Confdence Intervals. The Stata Journal 7(3), pp. 376--387. Stryhn, H., and J. Christensen. (2003). Confidence Intervals by the Profile Likelihood Method, with Applications in Veterinary Epidemiology. Contributed paper at ISVEE X (November 2003, Chile). http://people.upei.ca/hstryhn/stryhn208.pdf. Venzon, D.J., and S.H. Moolgavkar. (1988). A Method for Computing Profile-Likelihood-Based Confidence Intervals. Journal of the Royal Statistical Society, Series C (Applied Statistics) 37(1), pp. 87--94.

Examples

Run this code

# Generate 20 observations from a Poisson distribution with 
  # parameter lambda=10, and censor the values less than 10. 
  # Then generate 20 more observations from the same distribution 
  # and censor the values less than 20.  Then estimate the mean 
  # using the maximum likelihood method. 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(300) 
  dat.1 <- rpois(20, lambda=10) 
  censored.1 <- dat.1 < 10 
  dat.1[censored.1] <- 10 

  dat.2 <- rpois(20, lambda=10) 
  censored.2 <- dat.2 < 20 
  dat.2[censored.2] <- 20 

  dat <- c(dat.1, dat.2) 
  censored <- c(censored.1, censored.2) 

  epoisCensored(dat, censored, ci = TRUE)

  #Results of Distribution Parameter Estimation
  #Based on Type I Censored Data
  #--------------------------------------------
  #
  #Assumed Distribution:            Poisson
  #
  #Censoring Side:                  left
  #
  #Censoring Level(s):              10 20 
  #
  #Estimated Parameter(s):          lambda = 11.05402
  #
  #Estimation Method:               MLE
  #
  #Data:                            dat
  #
  #Censoring Variable:              censored
  #
  #Sample Size:                     40
  #
  #Percent Censored:                65%
  #
  #Confidence Interval for:         lambda
  #
  #Confidence Interval Method:      Profile Likelihood
  #
  #Confidence Interval Type:        two-sided
  #
  #Confidence Level:                95%
  #
  #Confidence Interval:             LCL =  9.842894
  #                                 UCL = 12.846484

  #----------

  # Clean up
  #---------
  rm(dat.1, censored.1, dat.2, censored.2, dat, censored)

Run the code above in your browser using DataLab