negbinomial: Negative Binomial Distribution Family Function

Description

Maximum likelihood estimation of the two parameters of a negative binomial distribution.

Usage

negbinomial(zero = "size", parallel = FALSE, deviance.arg = FALSE,
            mds.min = 1e-04, nsimEIM = 500, cutoff.prob = 0.999, eps.trig = 1e-7,
            max.support = 4000, max.chunk.MB = 30,
            lmu = "loge", lsize = "loge",
            imethod = 1, imu = NULL, probs.y = 0.35,
            ishrinkage = 0.95, isize = NULL, gsize.mux = exp((-12:6)/2))
polya(zero = "size", type.fitted = c("mean", "prob"),
           mds.min = 1e-04, nsimEIM = 500, cutoff.prob = 0.999,
           eps.trig = 1e-7, max.support = 4000, max.chunk.MB = 30,
           lprob = "logit", lsize = "loge",
           imethod = 1, iprob = NULL,
           probs.y = 0.35, ishrinkage = 0.95,
           isize = NULL, gsize.mux = exp((-12:6)/2),
           imunb = NULL)
polyaR(zero = "size", type.fitted = c("mean", "prob"),
           mds.min = 1e-04, nsimEIM = 500,  cutoff.prob = 0.999,
           eps.trig = 1e-7, max.support = 4000, max.chunk.MB = 30,
           lsize = "loge", lprob = "logit", 
           imethod = 1, isize = NULL,
           iprob = NULL, probs.y = 0.35,
           ishrinkage = 0.95, gsize.mux = exp((-12:6)/2),
           imunb = NULL)

Arguments

lmu, lsize, lprob

Link functions applied to the $\mu$, $k$ and $p$ parameters. See Links for more choices. Note that the $\mu$, $k$ and $p$ parameters are the mu, size and prob

imu, imunb, isize, iprob

Optional initial values for the mean and $k$ and $p$. For $k$, if failure to converge occurs then try different values (and/or use imethod). For a $S$-column response, isize can be of length $S$. A value NULL

nsimEIM

This argument is used for computing the diagonal element of the expected information matrix (EIM) corresponding to $k$ based on the simulated Fisher scoring (SFS) algorithm. See

cutoff.prob

Fed into the p argument of qnbinom in order to obtain an upper limit for the approximate support of the distribution, called Qmax, say. Similarly, the valu

max.chunk.MB, max.support

max.support is used to describe the eligibility of individual observations to have their EIM computed by the exact method. Here, we are concerned about computing the EIM wrt $k$. The exact method algorithm operates s

mds.min

Numeric. Minimum value of the NBD mean divided by size parameter. The closer this ratio is to 0, the closer the distribution is to a Poisson. Iterations will stop when an estimate of $k$ is so large, relative to the mean, than it is below thi

eps.trig

Numeric. A small positive value used in the computation of the EIMs. It focusses on the denominator of the terms of a series. Each term in the series (that is used to approximate an infinite series) has a value greater than size / sqrt(eps.trig)

gsize.mux

Similar to gsigma in CommonVGAMffArguments. However, this grid is multiplied by the initial estimates of the NBD mean parameter. That is, it is on a relative scale rather th

deviance.arg

Logical. If TRUE, the deviance is computed after convergence. It only works in the NB-2 model. It is also necessary to set criterion = "coefficients" or half.step = FALSE since one cannot use tha

imethod

An integer with value 1 or 2 or 3 which specifies the initialization method for the $\mu$ parameter. If failure to converge occurs try another value and/or else specify a value for ishrinkage and

parallel

See CommonVGAMffArguments for more information. Setting parallel = TRUE is useful in order to get something similar to quasipoisso

probs.y

Passed into the probs argument of quantile when imethod = 3 to obtain an initial value for the mean.

ishrinkage

How much shrinkage is used when initializing $\mu$. The value must be between 0 and 1 inclusive, and a value of 0 means the individual response values are used, and a value of 1 means the median or mean is used. This argument is used in conjunctio

zero

Can be an integer-valued vector, usually assigned $-2$ or $2$ if used at all. Specifies which of the two linear/additive predictors are modelled as an intercept only. By default, the $k$ parameter (after lsize is applied) is modelled

type.fitted

See CommonVGAMffArguments for details.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, rrvglm and vgam.

Warning

Poisson regression corresponds to $k$ equalling infinity. If the data is Poisson or close to Poisson, numerical problems will occur. Some corrective measures are taken, e.g., $k$ is capped during estimation to some large value and a warning is issued. Note that dnbinom(0, mu, size = Inf) currently is a NaN (a bug), therefore if the data has some 0s then setting crit = "coef" will avoid the problem that the log-likelihood will be undefined during the last stages of estimation. And setting stepsize = 0.5 for half stepping is probably a good idea too. Possibly setting crit = "coef" is a good idea because the log-likelihood is often a NaN when the size value is very large.

These functions are fragile; the maximum likelihood estimate of the index parameter is fraught (see Lawless, 1987). In general, the quasipoissonff is more robust. Other alternatives to negbinomial are to fit a NB-1 or RR-NB (aka NB-P) model; see Yee (2014). Also available are the NB-C, NB-H and NB-G. Assigning values to the isize argument may lead to a local solution, and smaller values are preferred over large values when using this argument.

If one wants to force SFS to be used on all observations, then set max.support = 0 or max.chunk.MB = 0. If one wants to force the exact method to be used for all observations, then set max.support = Inf. If the computer has much memory, then trying max.chunk.MB = Inf and max.support = Inf may provide a small speed increase. If SFS is used at all, then the @weights slot of the fitted object will be a matrix; otherwise that slot will be a 0 x 0 matrix.

Yet to do: write a family function which uses the methods of moments estimator for $k$.

Details

The negative binomial distribution can be motivated in several ways, e.g., as a Poisson distribution with a mean that is gamma distributed. There are several common parametrizations of the negative binomial distribution. The one used by negbinomial() uses the mean $\mu$ and an index parameter $k$, both which are positive. Specifically, the density of a random variable $Y$ is $$f(y;\mu,k) = {y + k - 1 \choose y} \, \left( \frac{\mu}{\mu+k} \right)^y\, \left( \frac{k}{k+\mu} \right)^k$$ where $y=0,1,2,\ldots$, and $\mu > 0$ and $k > 0$. Note that the dispersion parameter is $1/k$, so that as $k$ approaches infinity the negative binomial distribution approaches a Poisson distribution. The response has variance $Var(Y)=\mu+\mu^2/k$. When fitted, the fitted.values slot of the object contains the estimated value of the $\mu$ parameter, i.e., of the mean $E(Y)$. It is common for some to use $\alpha=1/k$ as the ancillary or heterogeneity parameter; so common alternatives for lsize are negloge and reciprocal.

For polya the density is $$f(y;p,k) = {y + k - 1 \choose y} \, \left( 1 - p \right)^y\, p^k$$ where $y=0,1,2,\ldots$, and $k > 0$ and $0 < p < 1$.

Family function polyaR() is the same as polya() except the order of the two parameters are switched. The reason is that polyaR() tries to match with rnbinom closely in terms of the argument order, etc. Should the probability parameter be of primary interest, probably, users will prefer using polya() rather than polyaR(). Possibly polyaR() will be decommissioned one day.

The negative binomial distribution can be coerced into the classical GLM framework with one of the parameters being of interest and the other treated as a nuisance/scale parameter (this is implemented in the MASS library). The VGAM family function negbinomial() treats both parameters on the same footing, and estimates them both by full maximum likelihood estimation.

The parameters $\mu$ and $k$ are independent (diagonal EIM), and the confidence region for $k$ is extremely skewed so that its standard error is often of no practical use. The parameter $1/k$ has been used as a measure of aggregation.

These VGAM family functions handle multiple responses, so that a response matrix can be inputted. The number of columns is the number of species, say, and setting zero = -2 means that all species have a $k$ equalling a (different) intercept only.

References

Lawless, J. F. (1987) Negative binomial and mixed Poisson regression. The Canadian Journal of Statistics 15, 209--225.

Hilbe, J. M. (2011) Negative Binomial Regression, 2nd Edition. Cambridge: Cambridge University Press.

Bliss, C. and Fisher, R. A. (1953) Fitting the negative binomial distribution to biological data. Biometrics 9, 174--200.

Yee, T. W. (2014) Reduced-rank vector generalized linear models with two linear predictors. Computational Statistics and Data Analysis, 71, 889--902.

Examples

Run this code

# Example 1: apple tree data (Bliss and Fisher, 1953)
appletree <- data.frame(y = 0:7, w = c(70, 38, 17, 10, 9, 3, 2, 1))
fit <- vglm(y ~ 1, negbinomial(deviance = TRUE), data = appletree,
            weights = w, crit = "coef")  # Obtain the deviance
fit <- vglm(y ~ 1, negbinomial(deviance = TRUE), data = appletree,
            weights = w, half.step = FALSE)  # Alternative method
summary(fit)
coef(fit, matrix = TRUE)
Coef(fit)  # For intercept-only models
deviance(fit)  # NB2 only; needs 'crit = "coef"' & 'deviance = TRUE' above

# Example 2: simulated data with multiple responses
ndata <- data.frame(x2 = runif(nn <- 200))
ndata <- transform(ndata, y1 = rnbinom(nn, mu = exp(3+x2), size = exp(1)),
                          y2 = rnbinom(nn, mu = exp(2-x2), size = exp(0)))
fit1 <- vglm(cbind(y1, y2) ~ x2, negbinomial, data = ndata, trace = TRUE)
coef(fit1, matrix = TRUE)

# Example 3: large counts implies SFS is used
ndata <- transform(ndata, y3 = rnbinom(nn, mu = exp(10+x2), size = exp(1)))
with(ndata, range(y3))  # Large counts
fit2 <- vglm(y3 ~ x2, negbinomial, data = ndata, trace = TRUE)
coef(fit2, matrix = TRUE)
head(fit2@weights)  # Non-empty; SFS was used

# Example 4: a NB-1 to estimate a negative binomial with Var(Y) = phi0 * mu
nn <- 200  # Number of observations
phi0 <- 10  # Specify this; should be greater than unity
delta0 <- 1 / (phi0 - 1)
mydata <- data.frame(x2 = runif(nn), x3 = runif(nn))
mydata <- transform(mydata, mu = exp(2 + 3 * x2 + 0 * x3))
mydata <- transform(mydata, y3 = rnbinom(nn, mu = mu, size = delta0 * mu))
plot(y3 ~ x2, data = mydata, pch = "+", col = 'blue',
     main = paste("Var(Y) = ", phi0, " * mu", sep = ""), las = 1)
nb1 <- vglm(y3 ~ x2 + x3, negbinomial(parallel = TRUE, zero = NULL),
            data = mydata, trace = TRUE)
# Extracting out some quantities:
cnb1 <- coef(nb1, matrix = TRUE)
mydiff <- (cnb1["(Intercept)", "loge(size)"] -
           cnb1["(Intercept)", "loge(mu)"])
delta0.hat <- exp(mydiff)
(phi.hat <- 1 + 1 / delta0.hat)  # MLE of phi
summary(nb1)
# Obtain a 95 percent confidence interval for phi0:
myvec <- rbind(-1, 1, 0, 0)
(se.mydiff <- sqrt(t(myvec) %*%  vcov(nb1) %*%  myvec))
ci.mydiff <- mydiff + c(-1.96, 1.96) * se.mydiff
ci.delta0 <- ci.exp.mydiff <- exp(ci.mydiff)
(ci.phi0 <- 1 + 1 / rev(ci.delta0))  # The 95 percent conf. interval for phi0

Confint.nb1(nb1)  # Quick way to get it

summary(glm(y3 ~ x2 + x3, quasipoisson, mydata))$disper  # cf. moment estimator

Run the code above in your browser using DataLab