If x
contains any missing (NA
), undefined (NaN
) or
infinite (Inf
, -Inf
) values, they will be removed prior to
performing the estimation.
Let \(\underline{x} = (x_1, x_2, \ldots, x_n)\) be a vector of
\(n\) observations from a
zero-modified lognormal distribution with
parameters meanlog=
\(\mu\), sdlog=
\(\sigma\), and
p.zero=
\(p\). Alternatively, let
\(\underline{x} = (x_1, x_2, \ldots, x_n)\) be a vector of
\(n\) observations from a
zero-modified lognormal distribution
(alternative parameterization) with parameters mean=
\(\theta\),
cv=
\(\tau\), and p.zero=
\(p\).
Let \(r\) denote the number of observations in \(\underline{x}\) that are equal
to 0, and order the observations so that \(x_1, x_2, \ldots, x_r\) denote
the \(r\) zero observations and \(x_{r+1}, x_{r+2}, \ldots, x_n\) denote
the \(n-r\) non-zero observations.
Note that \(\theta\) is not the mean of the zero-modified lognormal
distribution; it is the mean of the lognormal part of the distribution. Similarly,
\(\tau\) is not the coefficient of variation of the zero-modified
lognormal distribution; it is the coefficient of variation of the lognormal
part of the distribution.
Let \(\gamma\), \(\delta\), and \(\phi\) denote the mean, standard deviation,
and coefficient of variation of the overall zero-modified lognormal (delta)
distribution. Let \(\eta\) denote the standard deviation of the lognormal
part of the distribution, so that \(\eta = \theta \tau\). Aitchison (1955)
shows that:
$$\gamma = (1 - p) \theta \;\;\;\; (1)$$
$$\delta^2 = (1 - p) \eta^2 + p (1 - p) \theta^2 \;\;\;\; (2)$$
so that
$$\phi = \frac{\delta}{\gamma} = \frac{\sqrt{\tau^2 + p}}{\sqrt{1-p}} \;\;\;\; (3)$$
Estimation
Minimum Variance Unbiased Estimation (method="mvue"
)
Aitchison (1955) shows that the minimum variance unbiased estimators (mvue's) of
\(\gamma\) and \(\delta\) are:
\(\hat{\gamma}_{mvue} =\) |
\((1-\frac{r}{n}) e^{\bar{y}} g_{n-r-1}(\frac{s^2}{2})\) |
if \(r < n - 1\), |
|
\(x_n / n\) |
if \(r = n - 1\), |
|
\(0\) |
if \(r = n \;\;\;\; (4)\) |
|
|
|
\(\hat{\delta}^2_{mvue} =\) |
\((1-\frac{r}{n}) e^{2\bar{y}} \{g_{n-r-1}(2s^2) - \frac{n-r-1}{n-1} g_{n-r-1}[\frac{(n-r-2)s^2}{n-r-1}] \} \) |
if \(r < n - 1\), |
|
\(x_n^2 / n\) |
if \(r = n - 1\), |
where
$$y_i = log(x_i), \; r = r+1, r+2, \ldots, n \;\;\;\; (6)$$
$$\bar{y} = \frac{1}{n-r} \sum_{i=r+1}^n y_i \;\;\;\; (7)$$
$$s^2 = \frac{1}{n-r-1} \sum_{i=r+1}^n (y_i - \bar{y})^2 \;\;\;\; (8)$$
$$g_m(z) = \sum_{i=0}^\infty \frac{m^i (m+2i)}{m(m+2) \cdots (m+2i)} (\frac{m}{m+1})^i (\frac{z^i}{i!}) \;\;\;\; (9)$$
Note that when \(r=n-1\) or \(r=n\), the estimator of \(\gamma\) is simply the
sample mean for all observations (including zero values), and the estimator for
\(\delta^2\) is simply the sample variance for all observations.
The expected value and asymptotic variance of the mvue of \(\gamma\) are
(Aitchison and Brown, 1957, p.99; Owen and DeRouen, 1980):
$$E(\hat{\gamma}_{mvue}) = \gamma \;\;\;\; (10)$$
$$AVar(\hat{\gamma}_{mvue}) = \frac{1}{n} exp(2\mu + \sigma^2) (1-p) (p + \frac{2\sigma^2 + \sigma^4}{2}) \;\;\;\; (11)$$
Confidence Intervals
Based on Normal Approximation (ci.method="normal.approx"
)
An approximate \((1-\alpha)100\%\) confidence interval for \(\gamma\) is
constructed based on the assumption that the estimator of \(\gamma\) is
approximately normally distributed. Thus, an approximate two-sided
\((1-\alpha)100\%\) confidence interval for \(\gamma\) is constructed as:
$$[ \hat{\gamma}_{mvue} - t_{n-2, 1-\alpha/2} \hat{\sigma}_{\hat{\gamma}}, \; \hat{\gamma}_{mvue} + t_{n-2, 1-\alpha/2} \hat{\sigma}_{\hat{\gamma}} ] \;\;\;\; (12)$$
where \(t_{\nu, p}\) is the \(p\)'th quantile of
Student's t-distribution with \(\nu\) degrees of freedom, and
the quantity \(\hat{\sigma}_{\hat{\gamma}}\) is the estimated standard deviation
of the mvue of \(\gamma\), and is computed by replacing the values of
\(\mu\), \(\sigma\), and \(p\) in equation (11) above with their estimated
values and taking the square root.
Note that there must be at least 3 non-missing observations (\(n \ge 3\)) and
at least one observation must be non-zero (\(r \le n-1\)) in order to construct
a confidence interval.
One-sided confidence intervals are computed in a similar fashion.