epoisCensored(x, censored, method = "mle", censoring.side = "left",
ci = FALSE, ci.method = "profile.likelihood", ci.type = "two-sided",
conf.level = 0.95, n.bootstraps = 1000, pivot.statistic = "z",
ci.sample.size = sum(!censored))
NA
), undefined (NaN
), and
infinite (Inf
, -Inf
) values are allowed but will be removed.x
are censored.
This must be the same length as x
. If the mode of censored
is
"logical"
, TRUE
values correspond to elements of "mle"
(maximum likelihood; the default), and
"half.cen.level"
(moment estimation based on setting the censored
observations to half the "left"
(the default) and "right"
.ci=FALSE
."profile.likelihood"
(profile likelihood; the default),
"normal.approx"
(normal approximation)"two-sided"
(the default), "lower"
, and
"upper"
. This argument is ignored if ci=FALSE
.conf.level=0.95
. This argument is ignored if
ci=FALSE
.ci.type="bootstrap"
. This
argument is ignored if ci=FALSE
and/or ci.method
does not
equal ci.method="normal.approx"
(see the DETAILS section). The possible
values are pivot.statistic="z"
pivot.statistic="t"
and
ci.method="normal.approx"
. The default value is the number of
uncensored observations."estimateCensored"
containing the estimated parameters
and other information. See estimateCensored.object
for details.x
or censored
contain any missing (NA
), undefined (NaN
) or
infinite (Inf
, -Inf
) values, they will be removed prior to
performing the estimation.
Let $\underline{x}$ denote a vector of $N$ observations from a
Poisson distribution with mean
lambda=
$\lambda$.
Assume $n$ ($0 < n < N$) of these
observations are known and $c$ ($c=N-n$) of these observations are
all censored below (left-censored) or all censored above (right-censored) at
$k$ fixed censoring levels
$$T_1, T_2, \ldots, T_k; \; k \ge 1 \;\;\;\;\;\; (1)$$
For the case when $k \ge 2$, the data are said to be Type I
multiply censored. For the case when $k=1$,
set $T = T_1$. If the data are left-censored
and all $n$ known observations are greater
than or equal to $T$, or if the data are right-censored and all $n$
known observations are less than or equal to $T$, then the data are
said to be Type I singly censored (Nelson, 1982, p.7), otherwise
they are considered to be Type I multiply censored.
Let $c_j$ denote the number of observations censored below or above censoring
level $T_j$ for $j = 1, 2, \ldots, k$, so that
$$\sum_{i=1}^k c_j = c \;\;\;\;\;\; (2)$$
Let $x_{(1)}, x_{(2)}, \ldots, x_{(N)}$ denote the method="mle"
)
For Type I left censored data, the likelihood function is given by:
$$L(\lambda | \underline{x}) = {N \choose c_1 c_2 \ldots c_k n} \prod_{j=1}^k [F(T_j)]^{c_j} \prod_{i \in \Omega} f[x_{(i)}] \;\;\;\;\;\; (3)$$
where $f$ and $F$ denote the probability density function (pdf) and
cumulative distribution function (cdf) of the population
(Cohen, 1963; Cohen, 1991, pp.6, 50). That is,
$$f(t) = \frac{e^{-\lambda} \lambda^t}{t!}, \; x = 0, 1, 2, \ldots \;\;\;\;\;\; (4)$$
$$F(t) = \sum_{i=0}^t f(i) = \sum_{i=0}^t \frac{e^{-\lambda} \lambda^i}{i!} \;\;\;\;\;\; (5)$$
(Johnson et al., 1992, p.151). For left singly censored data, equation (3)
simplifies to:
$$L(\lambda | \underline{x}) = {N \choose c} [F(T)]^{c} \prod_{i = c+1}^n f[x_{(i)}] \;\;\;\;\;\; (6)$$
Similarly, for Type I right censored data, the likelihood function is given by:
$$L(\lambda | \underline{x}) = {N \choose c_1 c_2 \ldots c_k n} \prod_{j=1}^k [1 - F(T_j)]^{c_j} \prod_{i \in \Omega} f[x_{(i)}] \;\;\;\;\;\; (7)$$
and for right singly censored data this simplifies to:
$$L(\lambda | \underline{x}) = {N \choose c} [1 - F(T)]^{c} \prod_{i = 1}^n f[x_{(i)}] \;\;\;\;\;\; (8)$$
The maximum likelihood estimators are computed by maximizing the likelihood function.
For right-censored data, taking the derivative of the log-likelihood function
with respect to $\lambda$ and setting this to 0 produces the following equation:
$$\bar{x} = \lambda {1 - \sum_{j=1}^K \frac{c_j}{n} [\frac{f(T_j)}{1-F(T_j)}] } \;\;\;\;\;\; (9)$$
where
$$\bar{x} = \frac{1}{n} \sum{i \in \Omega} x_i \;\;\;\;\;\; (10)$$
Note that the quantity defined in equation (10) is simply the mean of the uncensored observations.
For left-censored data, taking the derivative of the log-likelihood function with
respect to $\lambda$ and setting this to 0 produces the following equation:
$$\bar{x} = \lambda {1 + \sum_{j=1}^K \frac{c_j}{n} [\frac{f(T_j - 1)}{F(T_j - 1)}] } \;\;\;\;\;\; (11)$$
The function epoisCensored
computes the maximum likelihood estimator
of $\lambda$ by solving Equation (9) (right-censored data) or
Equation (11) (left-censored data); it uses the sample mean of
the uncensored observations as the initial value.
Setting Censored Observations to Half the Censoring Level (method="half.cen.level"
)
This method is applicable only to left censored data.
This method involves simply replacing all the censored observations with half their
detection limit, and then computing the mean and standard deviation with the usual
formulas (see epois
).
This method is included only to allow comparison of this method to other methods.
Setting left-censored observations to half the censoring level is not
recommended.
Confidence Intervals
This section explains how confidence intervals for the mean $\lambda$ are
computed.
Likelihood Profile (ci.method="profile.likelihood"
)
This method was proposed by Cox (1970, p.88), and Venzon and Moolgavkar (1988)
introduced an efficient method of computation. This method is also discussed by
Stryhn and Christensen (2003) and Royston (2007).
The idea behind this method is to invert the likelihood-ratio test to obtain a
confidence interval for the mean $\lambda$. Equation (3) above
shows the form of the likelihood function $L(\lambda | \underline{x})$ for
multiply left-censored data, and Equation (7) shows the function for
multiply right-censored data.
Following Stryhn and Christensen (2003), denote the maximum likelihood estimate
of the mean by $\lambda^*$. The likelihood
ratio test statistic ($G^2$) of the hypothesis $H_0: \lambda = \lambda_0$
(where $\lambda_0$ is a fixed value) equals the drop in $2 log(L)$ between the
ci.method="normal.approx"
)
This method constructs approximate $(1-\alpha)100%$ confidence intervals for
$\lambda$ based on the assumption that the estimator of $\lambda$ is
approximately normally distributed. That is, a two-sided $(1-\alpha)100%$
confidence interval for $\lambda$ is constructed as:
$$[\hat{\lambda} - t_{1-\alpha/2, m-1}\hat{\sigma}_{\hat{\lambda}}, \; \hat{\lambda} + t_{1-\alpha/2, m-1}\hat{\sigma}_{\hat{\lambda}}] \;\;\;\; (13)$$
where $\hat{\lambda}$ denotes the estimate of $\lambda$,
$\hat{\sigma}_{\hat{\lambda}}$ denotes the estimated asymptotic standard
deviation of the estimator of $\lambda$, $m$ denotes the assumed sample
size for the confidence interval, and $t_{p,\nu}$ denotes the $p$'th
quantile of Student's t-distribuiton with $\nu$
degrees of freedom. One-sided confidence intervals are computed in a
similar fashion.
The argument ci.sample.size
determines the value of $m$ and by
default is equal to the number of uncensored observations.
This is simply an ad-hoc method of constructing
confidence intervals and is not based on any published theoretical results.
When pivot.statistic="z"
, the $p$'th quantile from the
standard normal distribution is used in place of the
$p$'th quantile from Student's t-distribution.
When $\lambda$ is estimated with the maximum likelihood estimator
(method="mle"
), the variance of $\hat{\lambda}$ is
estimated based on the inverse of the Fisher Information matrix. When
$\lambda$ is estimated using the half-censoring-level method
(method="half.cen.level"
), the variance of $\hat{\lambda}$ is
estimated as:
$$\hat{\sigma}^2_{\hat{\lambda}} = \frac{\hat{\lambda}}{m} \;\;\;\;\;\; (14)$$
where $m$ denotes the assumed sample size (see above).
Bootstrap and Bias-Corrected Bootstrap Approximation (ci.method="bootstrap"
)
The bootstrap is a nonparametric method of estimating the distribution
(and associated distribution parameters and quantiles) of a sample statistic,
regardless of the distribution of the population from which the sample was drawn.
The bootstrap was introduced by Efron (1979) and a general reference is
Efron and Tibshirani (1993).
In the context of deriving an approximate $(1-\alpha)100%$ confidence interval
for the population mean $\lambda$, the bootstrap can be broken down into the
following steps:
epoisCensored
, the number of bootstraps$B$is
determined by the argumentn.bootstraps
(see the section ARGUMENTS above).
The default value ofn.bootstraps
is1000
.ecdfPlot
), and then create a confidence interval for$\lambda$based on this estimated cdf.epoisCensored
calls the Rfunction quantile
to compute the empirical quantiles used in Equations (15)-(17).
The percentile method bootstrap confidence interval is only first-order
accurate (Efron and Tibshirani, 1993, pp.187-188), meaning that the probability
that the confidence interval will contain the true value of $\lambda$ can be
off by $k/\sqrt{N}$, where $k$is some constant. Efron and Tibshirani
(1993, pp.184-188) proposed a bias-corrected and accelerated interval that is
second-order accurate, meaning that the probability that the confidence interval
will contain the true value of $\lambda$ may be off by $k/N$ instead of
$k/\sqrt{N}$. The two-sided bias-corrected and accelerated confidence interval is
computed as:
$$[\hat{G}^{-1}(\alpha_1), \; \hat{G}^{-1}(\alpha_2)] \;\;\;\;\;\; (18)$$
where
$$\alpha_1 = \Phi[\hat{z}_0 + \frac{\hat{z}_0 + z_{\alpha/2}}{1 - \hat{a}(z_0 + z_{\alpha/2})}] \;\;\;\;\;\; (19)$$
$$\alpha_2 = \Phi[\hat{z}_0 + \frac{\hat{z}_0 + z_{1-\alpha/2}}{1 - \hat{a}(z_0 + z_{1-\alpha/2})}] \;\;\;\;\;\; (20)$$
$$\hat{z}_0 = \Phi^{-1}[\hat{G}(\hat{\lambda})] \;\;\;\;\;\; (21)$$
$$\hat{a} = \frac{\sum_{i=1}^N (\hat{\lambda}_{(\cdot)} - \hat{\lambda}_{(i)})^3}{6[\sum_{i=1}^N (\hat{\lambda}_{(\cdot)} - \hat{\lambda}_{(i)})^2]^{3/2}} \;\;\;\;\;\; (22)$$
where the quantity $\hat{\lambda}_{(i)}$ denotes the estimate of $\lambda$ using
all the values in $\underline{x}$ except the $i$'th one, and
$$\hat{\lambda}{(\cdot)} = \frac{1}{N} \sum_{i=1}^N \hat{\lambda_{(i)}} \;\;\;\;\;\; (23)$$
A one-sided lower confidence interval is given by:
$$[\hat{G}^{-1}(\alpha_1), \; \infty] \;\;\;\;\;\; (24)$$
and a one-sided upper confidence interval is given by:
$$[0, \; \hat{G}^{-1}(\alpha_2)] \;\;\;\;\;\; (25)$$
where $\alpha_1$ and $\alpha_2$ are computed as for a two-sided confidence
interval, except $\alpha/2$ is replaced with $\alpha$ in Equations (19) and (20).
The constant $\hat{z}_0$ incorporates the bias correction, and the constant
$\hat{a}$ is the acceleration constant. The term ci.method="bootstrap"
, the function epoisCensored
computes both
the percentile method and bias-corrected and accelerated method bootstrap confidence
intervals.epois
, estimateCensored.object
.# Generate 20 observations from a Poisson distribution with
# parameter lambda=10, and censor the values less than 10.
# Then generate 20 more observations from the same distribution
# and censor the values less than 20. Then estimate the mean
# using the maximum likelihood method.
# (Note: the call to set.seed simply allows you to reproduce this example.)
set.seed(300)
dat.1 <- rpois(20, lambda=10)
censored.1 <- dat.1 < 10
dat.1[censored.1] <- 10
dat.2 <- rpois(20, lambda=10)
censored.2 <- dat.2 < 20
dat.2[censored.2] <- 20
dat <- c(dat.1, dat.2)
censored <- c(censored.1, censored.2)
epoisCensored(dat, censored, ci = TRUE)
#Results of Distribution Parameter Estimation
#Based on Type I Censored Data
#--------------------------------------------
#
#Assumed Distribution: Poisson
#
#Censoring Side: left
#
#Censoring Level(s): 10 20
#
#Estimated Parameter(s): lambda = 11.05402
#
#Estimation Method: MLE
#
#Data: dat
#
#Censoring Variable: censored
#
#Sample Size: 40
#
#Percent Censored: 65%
#
#Confidence Interval for: lambda
#
#Confidence Interval Method: Profile Likelihood
#
#Confidence Interval Type: two-sided
#
#Confidence Level: 95%
#
#Confidence Interval: LCL = 9.842894
# UCL = 12.846484
#----------
# Clean up
#---------
rm(dat.1, censored.1, dat.2, censored.2, dat, censored)
Run the code above in your browser using DataLab