ezmlnorm: Estimate Parameters of a Zero-Modified Lognormal (Delta) Distribution

Description

Estimate the parameters of a zero-modified lognormal distribution or a zero-modified lognormal distribution (alternative parameterization), and optionally construct a confidence interval for the mean.

Usage

ezmlnorm(x, method = "mvue", ci = FALSE, ci.type = "two-sided", 
    ci.method = "normal.approx", conf.level = 0.95)
  ezmlnormAlt(x, method = "mvue", ci = FALSE, ci.type = "two-sided", 
    ci.method = "normal.approx", conf.level = 0.95)

Arguments

numeric vector of observations. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed.

method

character string specifying the method of estimation. The only possible value is "mvue" (minimum variance unbiased; the default). See the DETAILS section for more information on this estimation method.

logical scalar indicating whether to compute a confidence interval for the mean. The default value is FALSE. If ci=TRUE and there are less than three non-missing observations in x, or if all observations are zeros, a warning will be issued and no confidence interval will be computed.

ci.type

character string indicating what kind of confidence interval to compute. The possible values are "two-sided" (the default), "lower", and "upper". This argument is ignored if ci=FALSE.

ci.method

character string indicating what method to use to construct the confidence interval for the mean. The only possible value is "normal.approx" (the default). See the DETAILS section for more information. This argument is ignored if ci=FALSE.

conf.level

a scalar between 0 and 1 indicating the confidence level of the confidence interval. The default value is conf.level=0.95. This argument is ignored if ci=FALSE.

Value

a list of class "estimate" containing the estimated parameters and other information. See estimate.object for details.

For the function ezmlnorm, the component called parameters is a numeric vector with the following estimated parameters:

Parameter Name	Explanation
`meanlog`	mean of the log of the lognormal part of the distribution.
`sdlog`	standard deviation of the log of the lognormal part of the distribution.
`p.zero`	probability that an observation will be 0.
`mean.zmlnorm`	mean of the overall zero-modified lognormal (delta) distribution.
`sd.zmlnorm`	standard deviation of the overall zero-modified lognormal (delta) distribution.

For the function ezmlnormAlt, the component called parameters is a numeric vector with the following estimated parameters:

Parameter Name	Explanation
`mean`	mean of the lognormal part of the distribution.
`cv`	coefficient of variation of the lognormal part of the distribution.
`p.zero`	probability that an observation will be 0.
`mean.zmlnorm`	mean of the overall zero-modified lognormal (delta) distribution.
`sd.zmlnorm`	standard deviation of the overall zero-modified lognormal (delta) distribution.

Details

If x contains any missing (NA), undefined (NaN) or infinite (Inf, -Inf) values, they will be removed prior to performing the estimation.

Let $\underline{x} = (x_1, x_2, \ldots, x_n)$ be a vector of $n$ observations from a zero-modified lognormal distribution with parameters meanlog=$\mu$, sdlog=$\sigma$, and p.zero=$p$. Alternatively, let $\underline{x} = (x_1, x_2, \ldots, x_n)$ be a vector of $n$ observations from a zero-modified lognormal distribution (alternative parameterization) with parameters mean=$\theta$, cv=$\tau$, and p.zero=$p$.

Let $r$ denote the number of observations in $\underline{x}$ that are equal to 0, and order the observations so that $x_1, x_2, \ldots, x_r$ denote the $r$ zero observations and $x_{r+1}, x_{r+2}, \ldots, x_n$ denote the $n-r$ non-zero observations.

Note that $\theta$ is not the mean of the zero-modified lognormal distribution; it is the mean of the lognormal part of the distribution. Similarly, $\tau$ is not the coefficient of variation of the zero-modified lognormal distribution; it is the coefficient of variation of the lognormal part of the distribution.

Let $\gamma$, $\delta$, and $\phi$ denote the mean, standard deviation, and coefficient of variation of the overall zero-modified lognormal (delta) distribution. Let $\eta$ denote the standard deviation of the lognormal part of the distribution, so that $\eta = \theta \tau$. Aitchison (1955) shows that: $$\gamma = (1 - p) \theta \;\;\;\; (1)$$ $$\delta^2 = (1 - p) \eta^2 + p (1 - p) \theta^2 \;\;\;\; (2)$$ so that $$\phi = \frac{\delta}{\gamma} = \frac{\sqrt{\tau^2 + p}}{\sqrt{1-p}} \;\;\;\; (3)$$

Estimation

Minimum Variance Unbiased Estimation (method="mvue") Aitchison (1955) shows that the minimum variance unbiased estimators (mvue's) of $\gamma$ and $\delta$ are:

$\hat{\gamma}_{mvue} =$	$(1-\frac{r}{n}) e^{\bar{y}} g_{n-r-1}(\frac{s^2}{2})$	if $r < n - 1$,
	$x_n / n$	if $r = n - 1$,
	$0$	if $r = n \;\;\;\; (4)$

$\hat{\delta}^2_{mvue} =$	$(1-\frac{r}{n}) e^{2\bar{y}} \{g_{n-r-1}(2s^2) - \frac{n-r-1}{n-1} g_{n-r-1}[\frac{(n-r-2)s^2}{n-r-1}] \} $	if $r < n - 1$,
	$x_n^2 / n$	if $r = n - 1$,

where $$y_i = log(x_i), \; r = r+1, r+2, \ldots, n \;\;\;\; (6)$$ $$\bar{y} = \frac{1}{n-r} \sum_{i=r+1}^n y_i \;\;\;\; (7)$$ $$s^2 = \frac{1}{n-r-1} \sum_{i=r+1}^n (y_i - \bar{y})^2 \;\;\;\; (8)$$ $$g_m(z) = \sum_{i=0}^\infty \frac{m^i (m+2i)}{m(m+2) \cdots (m+2i)} (\frac{m}{m+1})^i (\frac{z^i}{i!}) \;\;\;\; (9)$$

Note that when $r=n-1$ or $r=n$, the estimator of $\gamma$ is simply the sample mean for all observations (including zero values), and the estimator for $\delta^2$ is simply the sample variance for all observations.

The expected value and asymptotic variance of the mvue of $\gamma$ are (Aitchison and Brown, 1957, p.99; Owen and DeRouen, 1980): $$E(\hat{\gamma}_{mvue}) = \gamma \;\;\;\; (10)$$ $$AVar(\hat{\gamma}_{mvue}) = \frac{1}{n} exp(2\mu + \sigma^2) (1-p) (p + \frac{2\sigma^2 + \sigma^4}{2}) \;\;\;\; (11)$$

Confidence Intervals

Based on Normal Approximation (ci.method="normal.approx") An approximate $(1-\alpha)100\%$ confidence interval for $\gamma$ is constructed based on the assumption that the estimator of $\gamma$ is approximately normally distributed. Thus, an approximate two-sided $(1-\alpha)100\%$ confidence interval for $\gamma$ is constructed as: $$[ \hat{\gamma}_{mvue} - t_{n-2, 1-\alpha/2} \hat{\sigma}_{\hat{\gamma}}, \; \hat{\gamma}_{mvue} + t_{n-2, 1-\alpha/2} \hat{\sigma}_{\hat{\gamma}} ] \;\;\;\; (12)$$ where $t_{\nu, p}$ is the $p$'th quantile of Student's t-distribution with $\nu$ degrees of freedom, and the quantity $\hat{\sigma}_{\hat{\gamma}}$ is the estimated standard deviation of the mvue of $\gamma$, and is computed by replacing the values of $\mu$, $\sigma$, and $p$ in equation (11) above with their estimated values and taking the square root.

Note that there must be at least 3 non-missing observations ($n \ge 3$) and at least one observation must be non-zero ($r \le n-1$) in order to construct a confidence interval.

One-sided confidence intervals are computed in a similar fashion.

References

Aitchison, J. (1955). On the Distribution of a Positive Random Variable Having a Discrete Probability Mass at the Origin. Journal of the American Statistical Association 50, 901--908.

Aitchison, J., and J.A.C. Brown (1957). The Lognormal Distribution (with special reference to its uses in economics). Cambridge University Press, London. pp.94-99.

Crow, E.L., and K. Shimizu. (1988). Lognormal Distributions: Theory and Applications. Marcel Dekker, New York, pp.47--51.

Gibbons, RD., D.K. Bhaumik, and S. Aryal. (2009). Statistical Methods for Groundwater Monitoring. Second Edition. John Wiley and Sons, Hoboken, NJ.

Gilliom, R.J., and D.R. Helsel. (1986). Estimation of Distributional Parameters for Censored Trace Level Water Quality Data: 1. Estimation Techniques. Water Resources Research 22, 135--146.

Helsel, D.R. (2012). Statistics for Censored Environmental Data Using Minitab and R. Second Edition. John Wiley and Sons, Hoboken, NJ, Chapter 1.

Johnson, N. L., S. Kotz, and A.W. Kemp. (1992). Univariate Discrete Distributions. Second Edition. John Wiley and Sons, New York, p.312.

Owen, W., and T. DeRouen. (1980). Estimation of the Mean for Lognormal Data Containing Zeros and Left-Censored Values, with Applications to the Measurement of Worker Exposure to Air Contaminants. Biometrics 36, 707--719.

USEPA (1992c). Statistical Analysis of Ground-Water Monitoring Data at RCRA Facilities: Addendum to Interim Final Guidance. Office of Solid Waste, Permits and State Programs Division, US Environmental Protection Agency, Washington, D.C.

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C.

Examples

Run this code

# NOT RUN {
  # Generate 100 observations from a zero-modified lognormal (delta) 
  # distribution with mean=2, cv=1, and p.zero=0.5, then estimate the 
  # parameters. According to equations (1) and (3) above, the overall mean 
  # is mean.zmlnorm=1 and the overall cv is cv.zmlnorm=sqrt(3). 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(250) 
  dat <- rzmlnormAlt(100, mean = 2, cv = 1, p.zero = 0.5) 
  ezmlnormAlt(dat, ci = TRUE) 

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            Zero-Modified Lognormal (Delta)
  #
  #Estimated Parameter(s):          mean         = 1.9604561
  #                                 cv           = 0.9169411
  #                                 p.zero       = 0.4500000
  #                                 mean.zmlnorm = 1.0782508
  #                                 cv.zmlnorm   = 1.5307175
  #
  #Estimation Method:               mvue
  #
  #Data:                            dat
  #
  #Sample Size:                     100
  #
  #Confidence Interval for:         mean.zmlnorm
  #
  #Confidence Interval Method:      Normal Approximation
  #                                 (t Distribution)
  #
  #Confidence Interval Type:        two-sided
  #
  #Confidence Level:                95%
  #
  #Confidence Interval:             LCL = 0.748134
  #                                 UCL = 1.408368

  #----------

  # Clean up
  rm(dat)
# }

Run the code above in your browser using DataLab

\(\hat{\gamma}_{mvue} =\)	\((1-\frac{r}{n}) e^{\bar{y}} g_{n-r-1}(\frac{s^2}{2})\)	if \(r < n - 1\),
	\(x_n / n\)	if \(r = n - 1\),
	\(0\)	if \(r = n \;\;\;\; (4)\)

\(\hat{\delta}^2_{mvue} =\)	\((1-\frac{r}{n}) e^{2\bar{y}} \{g_{n-r-1}(2s^2) - \frac{n-r-1}{n-1} g_{n-r-1}[\frac{(n-r-2)s^2}{n-r-1}] \} \)	if \(r < n - 1\),
	\(x_n^2 / n\)	if \(r = n - 1\),