zipoisson: Zero-Inflated Poisson Distribution Family Function

Description

Fits a zero-inflated Poisson distribution by full maximum likelihood estimation.

Usage

zipoissonff(llambda = "loge", lprobp = "logit",
            ilambda = NULL,   iprobp = NULL,
            imethod = 1, shrinkage.init = 0.8, zero = -2)
zipoisson(lpstr0 = "logit", llambda = "loge",
          ipstr0 = NULL,    ilambda = NULL,
          imethod = 1, shrinkage.init = 0.8, zero = NULL)

Arguments

lpstr0, llambda

Link function for the parameter $\phi$ and the usual $\lambda$ parameter. See Links for more choices; see CommonVGAMffArguments for more in

ipstr0, ilambda

Optional initial values for $\phi$, whose values must lie between 0 and 1. Optional initial values for $\lambda$, whose values must be positive. The defaults are to compute an initial value internally for each. If a vector then recycling is used

lprobp, iprobp

Corresponding arguments for the other parameterization. See details below.

imethod

An integer with value 1 or 2 which specifies the initialization method for $\lambda$. If failure to converge occurs try another value and/or else specify a value for shrinkage.init and/or else specify a value

shrinkage.init

How much shrinkage is used when initializing $\lambda$. The value must be between 0 and 1 inclusive, and a value of 0 means the individual response values are used, and a value of 1 means the median or mean is used. This argument is used in conju

zero

An integer specifying which linear/additive predictor is modelled as intercepts only. If given, the value must be either 1 or 2, and the default is none of them. Setting zero = 1 makes $\phi$ a single parameter. See

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, rrvglm and vgam.

Warning

Numerical problems can occur, e.g., when the probability of zero is actually less than, not more than, the nominal probability of zero. For example, in the Angers and Biswas (2003) data below, replacing 182 by 1 results in nonconvergence. Half-stepping is not uncommon. If failure to converge occurs, try using combinations of imethod, shrinkage.init, ipstr0, and/or zipoisson(zero = 1) if there are explanatory variables. The default for zipoissonff() is to model the structural zero probability as an intercept-only.

Details

This model is a mixture of a Poisson distribution and the value 0; it has value 0 with probability $\phi$ else is Poisson($\lambda$) distributed. Thus there are two sources for zero values, and $\phi$ is the probability of a structural zero. The model for zipoisson() can be written $$P(Y = 0) = \phi + (1-\phi) \exp(-\lambda),$$ and for $y=1,2,\ldots$, $$P(Y = y) = (1-\phi) \exp(-\lambda) \lambda^y / y!.$$ Here, the parameter $\phi$ satisfies $0 < \phi < 1$. The mean of $Y$ is $(1-\phi) \lambda$ and these are returned as the fitted values. The variance of $Y$ is $(1-\phi) \lambda (1 + \phi \lambda)$. By default, the two linear/additive predictors are $(logit(\phi), \log(\lambda))^T$.

The VGAM family function zipoissonff() has a few changes compared to zipoisson(). These are: (i) the order of the linear/additive predictors is switched so the Poisson mean comes first; (ii) probp is now the probability of the Poisson component, i.e., probp is 1-pstr0; (iii) it can handle multiple responses; (iv) argument zero has a new default so that the probp is an intercept-only by default. Now zipoissonff() is generally recommended over zipoisson() (and definitely recommended over yip88). Both functions implement Fisher scoring and can handle multiple responses.

References

Thas, O. and Rayner, J. C. W. (2005) Smooth tests for the zero-inflated Poisson distribution. Biometrics, 61, 808--815.

Data: Angers, J-F. and Biswas, A. (2003) A Bayesian analysis of zero-inflated generalized Poisson model. Computational Statistics & Data Analysis, 42, 37--46.

Cameron, A. C. and Trivedi, P. K. (1998) Regression Analysis of Count Data. Cambridge University Press: Cambridge.

Yee, T. W. (2013) Reduced-rank vector generalized linear models with two linear predictors. Computational Statistics and Data Analysis.

Examples

Run this code

# Example 1: simulated ZIP data
set.seed(123)
zdata <- data.frame(x2 = runif(nn <- 2000))
zdata <- transform(zdata, pstr01  = logit(-0.5 + 1*x2, inverse = TRUE),
                          pstr02  = logit( 0.5 - 1*x2, inverse = TRUE),
                          Ps01    = logit(-0.5       , inverse = TRUE),
                          Ps02    = logit( 0.5       , inverse = TRUE),
                          lambda1 =  loge(-0.5 + 2*x2, inverse = TRUE),
                          lambda2 =  loge( 0.5 + 2*x2, inverse = TRUE))
zdata <- transform(zdata, y1 = rzipois(nn, lambda = lambda1, pstr0 = Ps01),
                          y2 = rzipois(nn, lambda = lambda2, pstr0 = Ps02))

with(zdata, table(y1)) # Eyeball the data
with(zdata, table(y2))
fit1 <- vglm(y1 ~ x2, zipoisson(zero = 1), zdata, crit = "coef")
fit2 <- vglm(y2 ~ x2, zipoisson(zero = 1), zdata, crit = "coef")
coef(fit1, matrix = TRUE) # These should agree with the above values
coef(fit2, matrix = TRUE) # These should agree with the above values

# Fit all two simultaneously, using a different parameterization:
fit12 <- vglm(cbind(y1, y2) ~ x2, zipoissonff, zdata, crit = "coef")
coef(fit12, matrix = TRUE) # These should agree with the above values

# For the first observation compute the probability that y1 is
# due to a structural zero.
head(zdata, 1)
pfit1 <- predict(fit1, zdata[1, ])
pstr0 <- logit(pfit1[1], inverse = TRUE)
lambda <- loge(pfit1[2], inverse = TRUE)
(prob.struc.0 <- pstr0 / dzipois(x = 0, lambda = lambda, pstr0 = pstr0))


# Example 2: McKendrick (1926). Data from 223 Indian village households
cholera <- data.frame(ncases = 0:4, # Number of cholera cases,
                      wfreq  = c(168, 32, 16, 6, 1)) # Frequencies
fit <- vglm(ncases ~ 1, zipoisson, wei = wfreq, cholera, trace = TRUE)
coef(fit, matrix = TRUE)
with(cholera, cbind(actual = wfreq,
                    fitted = round(dzipois(ncases, lambda = Coef(fit)[2],
                                           pstr0 = Coef(fit)[1]) *
                                   sum(wfreq), dig = 2)))

# Example 3: data from Angers and Biswas (2003)
abdata <- data.frame(y = 0:7, w = c(182, 41, 12, 2, 2, 0, 0, 1))
abdata <- subset(abdata, w > 0)
fit <- vglm(y ~ 1, zipoisson(lpstr0 = probit, ipstr0 = 0.8),
            abdata, weight = w, trace = TRUE)
fit@misc$pobs0  # Estimate of P(Y = 0)
coef(fit, matrix = TRUE)
Coef(fit)  # Estimate of pstr0 and lambda
fitted(fit)
with(abdata, weighted.mean(y, w)) # Compare this with fitted(fit)
summary(fit)

# Example 4: zero-deflated model for an intercept-only data
zdata <- transform(zdata, lambda3 = loge( 0.0       , inverse = TRUE))
zdata <- transform(zdata, deflat_limit = -1 / expm1(lambda3)) # Boundary
# The 'pstr0' parameter is negative and in parameter space:
zdata <- transform(zdata, usepstr0 = deflat_limit / 1.5)
zdata <- transform(zdata, y3 = rzipois(nn, lambda3, pstr0 = usepstr0))
head(zdata)
with(zdata, table(y3)) # A lot of deflation
fit3 <- vglm(y3 ~ 1, zipoisson(zero = -1, lpstr0 = identity),
             zdata, trace = TRUE, crit = "coef")
coef(fit3, matrix = TRUE)
# Check how accurate it was:
zdata[1, 'usepstr0'] # Answer
coef(fit3)[1]        # Estimate
Coef(fit3)

# Example 5: This RR-ZIP is known as a COZIGAM or COZIVGLM-ZIP
set.seed(123)
rrzip <- rrvglm(Alopacce ~ bs(WaterCon, df = 3), zipoisson(zero = NULL),
                hspider, trace = TRUE, Index.corner = 2)
coef(rrzip, matrix = TRUE)
Coef(rrzip)
summary(rrzip)
plotvgam(rrzip, lcol = "blue")

Run the code above in your browser using DataLab