negbinomial: Negative Binomial Distribution Family Function

Description

Maximum likelihood estimation of the two parameters of a negative binomial distribution.

Usage

negbinomial(lmu = "loge", lk = "loge", emu = list(), ek = list(),
            imu = NULL, ik = NULL, quantile.probs = 0.75,
            nsimEIM = 100, cutoff = 0.995,
            Maxiter = 5000, deviance.arg = FALSE, method.init = 1,
            shrinkage.init = 0.95, zero = -2)

Arguments

lmu, lk

Link functions applied to the $\mu$ and $k$ parameters. See Links for more choices. Note that the $k$ parameter is the size argument of

emu, ek

List. Extra argument for each of the links. See earg in Links for general information.

imu, ik

Optional initial values for the mean and $k$. For the latter, if failure to converge occurs then try different values (and/or use method.init). For a $S$-column response, ik can be of length $S$. A value NULL

quantile.probs

Passed into the probs argument of quantile when method.init = 3 to obtain an initial value for the mean.

nsimEIM

This argument is used for computing the diagonal element of the expected information matrix (EIM) corresponding to $k$. See CommonVGAMffArguments for more information and the

cutoff

Used in the finite series approximation. A numeric which is close to 1 but never exactly 1. Used to specify how many terms of the infinite series for computing the second diagonal element of the EIM are actually used. The sum of the probabilites

Maxiter

Used in the finite series approximation. Integer. The maximum number of terms allowed when computing the second diagonal element of the EIM. In theory, the value involves an infinite series. If this argument is too small then the value may be inac

deviance.arg

Logical. If TRUE, the deviance function is attached to the object. Under ordinary circumstances, it should be left alone because it really assumes the index parameter is at the maximum likelihood estimate. Consequently, one cannot use t

method.init

An integer with value 1 or 2 or 3 which specifies the initialization method for the $\mu$ parameter. If failure to converge occurs try another value and/or else specify a value for shrinkage.init

shrinkage.init

How much shrinkage is used when initializing $\mu$. The value must be between 0 and 1 inclusive, and a value of 0 means the individual response values are used, and a value of 1 means the median or mean is used. This argument is used in conjunctio

zero

Integer valued vector, usually assigned $-2$ or $2$ if used at all. Specifies which of the two linear/additive predictors are modelled as an intercept only. By default, the $k$ parameter (after lk is applied) is modelled as a single un

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm and vgam.

Warning

The Poisson model corresponds to $k$ equalling infinity. If the data is Poisson or close to Poisson, numerical problems will occur. Possibly choosing a log-log link may help in such cases, otherwise use poissonff.

This function is fragile; the maximum likelihood estimate of the index parameter is fraught (see Lawless, 1987). In general, the quasipoissonff is more robust than this function. Assigning values to the ik argument may lead to a local solution, and smaller values are preferred over large values when using this argument.

Yet to do: write a family function which uses the methods of moments estimator for $k$.

Details

The negative binomial distribution can be motivated in several ways, e.g., as a Poisson distribution with a mean that is gamma distributed. There are several common parametrizations of the negative binomial distribution. The one used here uses the mean $\mu$ and an index parameter $k$, both which are positive. Specifically, the density of a random variable $Y$ is $$f(y;\mu,k) ~=~ {y + k - 1 \choose y} \, \left( \frac{\mu}{\mu+k} \right)^y\, \left( \frac{k}{k+\mu} \right)^k$$ where $y=0,1,2,\ldots$, and $\mu > 0$ and $k > 0$. Note that the dispersion parameter is $1/k$, so that as $k$ approaches infinity the negative binomial distribution approaches a Poisson distribution. The response has variance $Var(Y)=\mu+\mu^2/k$. When fitted, the fitted.values slot of the object contains the estimated value of the $\mu$ parameter, i.e., of the mean $E(Y)$.

The negative binomial distribution can be coerced into the classical GLM framework, with one of the parameters being of interest and the other treated as a nuisance/scale parameter (and implemented in the MASS library). This VGAM family function negbinomial treats both parameters on the same footing, and estimates them both by full maximum likelihood estimation. Simulated Fisher scoring is employed as the default (see the nsimEIM argument).

The parameters $\mu$ and $k$ are independent (diagonal EIM), and the confidence region for $k$ is extremely skewed so that its standard error is often of no practical use. The parameter $1/k$ has been used as a measure of aggregation.

This VGAM function handles multivariate responses, so that a matrix can be used as the response. The number of columns is the number of species, say, and setting zero=-2 means that all species have a $k$ equalling a (different) intercept only.

References

Lawless, J. F. (1987) Negative binomial and mixed Poisson regression. The Canadian Journal of Statistics 15, 209--225.

Hilbe, J. M. (2007) Negative Binomial Regression. Cambridge: Cambridge University Press.

Bliss, C. and Fisher, R. A. (1953) Fitting the negative binomial distribution to biological data. Biometrics 9, 174--200.

Examples

Run this code

# Example 1: apple tree data
appletree = data.frame(y = 0:7, w = c(70, 38, 17, 10, 9, 3, 2, 1))
fit = vglm(y ~ 1, negbinomial, appletree, weights = w)
summary(fit)
coef(fit, matrix = TRUE)
Coef(fit)

# Example 2: simulated data with multivariate response
ndata = data.frame(x = runif(nn <- 500))
ndata = transform(ndata, y1 = rnbinom(nn, mu=exp(3+x), size = exp(1)),
                         y2 = rnbinom(nn, mu=exp(2-x), size = exp(0)))
fit1 = vglm(cbind(y1,y2) ~ x, negbinomial, ndata, trace = TRUE)
coef(fit1, matrix = TRUE)

# Example 3: large counts so definitely use the nsimEIM argument
ndata = transform(ndata, y3 = rnbinom(nn, mu=exp(12+x), size = exp(1)))
with(ndata, range(y3))  # Large counts
fit2 = vglm(y3 ~ x, negbinomial(nsimEIM=100), ndata, trace = TRUE)
coef(fit2, matrix = TRUE)

Run the code above in your browser using DataLab