
Maximum likelihood estimation of the two parameters of a negative binomial distribution.
negbinomial(zero = "size", parallel = FALSE, deviance.arg = FALSE,
type.fitted = c("mean", "quantiles"),
percentiles = c(25, 50, 75),
mds.min = 1e-3, nsimEIM = 500, cutoff.prob = 0.999,
eps.trig = 1e-7, max.support = 4000, max.chunk.MB = 30,
lmu = "loglink", lsize = "loglink",
imethod = 1, imu = NULL, iprobs.y = NULL,
gprobs.y = ppoints(6), isize = NULL,
gsize.mux = exp(c(-30, -20, -15, -10, -6:3)))
polya(zero = "size", type.fitted = c("mean", "prob"),
mds.min = 1e-3, nsimEIM = 500, cutoff.prob = 0.999,
eps.trig = 1e-7, max.support = 4000, max.chunk.MB = 30,
lprob = "logitlink", lsize = "loglink", imethod = 1, iprob = NULL,
iprobs.y = NULL, gprobs.y = ppoints(6), isize = NULL,
gsize.mux = exp(c(-30, -20, -15, -10, -6:3)), imunb = NULL)
polyaR(zero = "size", type.fitted = c("mean", "prob"),
mds.min = 1e-3, nsimEIM = 500, cutoff.prob = 0.999,
eps.trig = 1e-7, max.support = 4000, max.chunk.MB = 30,
lsize = "loglink", lprob = "logitlink", imethod = 1, iprob = NULL,
iprobs.y = NULL, gprobs.y = ppoints(6), isize = NULL,
gsize.mux = exp(c(-30, -20, -15, -10, -6:3)), imunb = NULL)
Can be an integer-valued vector, and if so, then
it is usually assigned lsize
is applied) is modelled as a single unknown number that
is estimated. It can be modelled as a function of the
explanatory variables by setting zero = NULL
; this
has been called a NB-H model by Hilbe (2011). A negative
value means that the value is recycled, so setting CommonVGAMffArguments
for more information.
Link functions applied to the Links
for more choices.
Note that the mu
,
size
and prob
arguments of
rnbinom
respectively.
Common alternatives for lsize
are
negloglink
and
reciprocallink
, and
logloglink
(if
Optional initial values for the mean and imethod
).
For a isize
can be of length NULL
means an initial value for each response is
computed internally using a gridsearch based on gsize.mux
.
The last argument is ignored if used within cqo
; see
the iKvector
argument of qrrvglm.control
instead.
In the future isize
and iprob
might be depreciated.
This argument is used
for computing the diagonal element of the
expected information matrix (EIM) corresponding to CommonVGAMffArguments
for more information
and the notes below.
SFS is one of two algorithms for computing the EIM elements
(so that both algorithms may be used on a given data set).
SFS is faster than the exact method when Qmax
is large.
Fed into the p
argument
of qnbinom
in order to obtain an upper limit for the approximate
support of the distribution, called Qmax
, say.
Similarly, the value 1-p
is
fed into the p
argument
of qnbinom
in order to obtain a lower limit for the approximate
support of the distribution, called Qmin
, say.
Hence the approximate support is Qmin:Qmax
.
This argument should be
a numeric and close to 1 but never exactly 1.
Used to specify how many terms of the infinite series
for computing the second diagonal element of the
EIM are actually used.
The closer this argument is to 1, the more accurate the
standard errors of the regression coefficients will be.
If this argument is too small, convergence will take longer.
max.support
is used to describe the eligibility of
individual observations
to have their EIM computed by the exact method.
Here, we are concerned about
computing the EIM wrt max.support
.
If so, then the computations are done in chunks, so
that no more than about max.chunk.MB
megabytes
of memory is used at a time (actually, it is proportional to this amount).
Regarding eligibility of this algorithm, each observation must
have the length of the vector, starting from
the 1-cutoff.prob
quantile
and finishing up at the cutoff.prob
quantile,
less than max.support
(as its approximate support).
If you have abundant memory then you might try setting
max.chunk.MB = Inf
, but then the computations might take
a very long time.
Setting max.chunk.MB = 0
or max.support = 0
will force the EIM to be computed using the SFS algorithm only
(this used to be the default method for all the observations).
When the fitted values of the model are large and max.support
limits the cost in terms of time.
For intercept-only models max.support
is multiplied by
a number (such as 10) because only one inner product needs be computed.
Note: max.support
is an upper bound and limits the number of
terms dictated by the eps.trig
argument.
Numeric.
Minimum value of the NBD mean divided by size
parameter.
The closer this ratio is to 0, the closer the distribution is
to a Poisson.
Iterations will stop when an estimate of
Numeric.
A small positive value used in the computation of the EIMs.
It focusses on the denominator of the terms of a series.
Each term in the series (that is used to approximate an infinite series)
has a value greater than size / sqrt(eps.trig)
,
thus very small terms are ignored.
It's a good idea to set a smaller value that will result in more accuracy,
but it will require a greater computing time (when max.support
may be needed.
In particular, the quantity computed by special means
is trigamma
.
functions. It is part of the calculation of the EIM with
respect to the size
parameter.
Similar to gsigma
in CommonVGAMffArguments
.
However, this grid is multiplied by the initial
estimates of the NBD mean parameter.
That is, it is on a relative scale rather than on an
absolute scale.
If the counts are very large in value then convergence fail might
occur; if so, then try a smaller value such as
gsize.mux = exp(-40)
.
See CommonVGAMffArguments
for more information.
Logical.
If TRUE
, the deviance is computed after convergence.
It only works in the NB-2 model.
It is also necessary to set criterion = "coefficients"
or half.step = FALSE
since
one cannot use that criterion properly for the minimization
within the IRLS algorithm.
It should be set TRUE
when
used with cqo
under the fast algorithm.
An integer with value 1
or 2
etc. which
specifies the initialization method for the iprobs.y
and/or else specify a value for isize
.
See CommonVGAMffArguments
for more information.
Setting parallel = TRUE
is useful in order to get
something similar to quasipoisson
or
what is known as NB-1.
If parallel = TRUE
then the parallelism constraint
does not apply to any intercept term.
You should set zero = NULL
too if parallel = TRUE
to
avoid a conflict.
A vector representing a grid;
passed into the probs
argument
of quantile
when imethod = 1
to obtain an initial value for the mean
of each response. Is overwritten by any value of iprobs.y
.
Passed into the probs
argument
of quantile
when imethod = 1
to obtain an initial value for the mean
of each response. Overwrites any value of gprobs.y
.
This argument might be deleted in the future.
An object of class "vglmff"
(see vglmff-class
).
The object is used by modelling functions such as vglm
,
rrvglm
and vgam
.
Poisson regression corresponds to stepsize = 0.5
for half stepping is
probably a good idea too when the data is extreme.
The NBD is a strictly unimodal
distribution. Any data set that does not exhibit a mode (somewhere
in the middle) makes the estimation problem difficult.
Set trace = TRUE
to monitor convergence.
These functions are fragile; the maximum likelihood
estimate of the index parameter is fraught (see Lawless, 1987).
Other alternatives to negbinomial
are
to fit a NB-1 or RR-NB (aka NB-P) model; see Yee (2014).
Also available are the NB-C, NB-H and NB-G.
Assigning values to the isize
argument may lead
to a local solution, and smaller values are preferred
over large values when using this argument.
If one wants to force SFS
to be used on all observations, then
set max.support = 0
or max.chunk.MB = 0
.
If one wants to force the exact method
to be used for all observations, then
set max.support = Inf
.
If the computer has much memory, then trying
max.chunk.MB = Inf
and
max.support = Inf
may provide a small speed increase.
If SFS is used at all, then the @weights
slot of the
fitted object will be a matrix;
otherwise that slot will be a 0 x 0
matrix.
Yet to do: write a family function which uses the methods
of moments estimator for
The negative binomial distribution (NBD)
can be motivated in several ways,
e.g., as a Poisson distribution with a mean that is gamma
distributed.
There are several common parametrizations of the NBD.
The one used by negbinomial()
uses the
mean fitted.values
slot of the object contains
the estimated value of the lsize
are
negloglink
and
reciprocallink
.
For polya
the density is
Family function polyaR()
is the same as polya()
except
the order of the two parameters are switched.
The reason is that polyaR()
tries to match with
rnbinom
closely
in terms of the argument order, etc.
Should the probability parameter be of primary interest,
probably, users will prefer using polya()
rather than
polyaR()
.
Possibly polyaR()
will be decommissioned one day.
The NBD can be coerced into the
classical GLM framework with one of the parameters being
of interest and the other treated as a nuisance/scale
parameter (this is implemented in the MASS library). The
VGAM family function negbinomial()
treats both
parameters on the same footing, and estimates them both
by full maximum likelihood estimation.
The parameters
These VGAM family functions handle
multiple responses, so that a response matrix can be
inputted. The number of columns is the number
of species, say, and setting zero = -2
means that
all species have a
Lawless, J. F. (1987) Negative binomial and mixed Poisson regression. The Canadian Journal of Statistics 15, 209--225.
Hilbe, J. M. (2011) Negative Binomial Regression, 2nd Edition. Cambridge: Cambridge University Press.
Bliss, C. and Fisher, R. A. (1953) Fitting the negative binomial distribution to biological data. Biometrics 9, 174--200.
Yee, T. W. (2014) Reduced-rank vector generalized linear models with two linear predictors. Computational Statistics and Data Analysis, 71, 889--902.
quasipoisson
,
poissonff
,
zinegbinomial
,
negbinomial.size
(e.g., NB-G),
nbcanlink
(NB-C),
posnegbinomial
,
inv.binomial
,
rnbinom
,
nbordlink
,
rrvglm
,
cao
,
cqo
,
CommonVGAMffArguments
,
simulate.vlm
,
ppoints
,
qnbinom
.
# NOT RUN {
# Example 1: apple tree data (Bliss and Fisher, 1953)
appletree <- data.frame(y = 0:7, w = c(70, 38, 17, 10, 9, 3, 2, 1))
fit <- vglm(y ~ 1, negbinomial(deviance = TRUE), data = appletree,
weights = w, crit = "coef") # Obtain the deviance
fit <- vglm(y ~ 1, negbinomial(deviance = TRUE), data = appletree,
weights = w, half.step = FALSE) # Alternative method
summary(fit)
coef(fit, matrix = TRUE)
Coef(fit) # For intercept-only models
deviance(fit) # NB2 only; needs 'crit = "coef"' & 'deviance = TRUE' above
# Example 2: simulated data with multiple responses
# }
# NOT RUN {
ndata <- data.frame(x2 = runif(nn <- 200))
ndata <- transform(ndata, y1 = rnbinom(nn, mu = exp(3+x2), size = exp(1)),
y2 = rnbinom(nn, mu = exp(2-x2), size = exp(0)))
fit1 <- vglm(cbind(y1, y2) ~ x2, negbinomial, data = ndata, trace = TRUE)
coef(fit1, matrix = TRUE)
# }
# NOT RUN {
# Example 3: large counts implies SFS is used
# }
# NOT RUN {
ndata <- transform(ndata, y3 = rnbinom(nn, mu = exp(10+x2), size = exp(1)))
with(ndata, range(y3)) # Large counts
fit2 <- vglm(y3 ~ x2, negbinomial, data = ndata, trace = TRUE)
coef(fit2, matrix = TRUE)
head(fit2@weights) # Non-empty; SFS was used
# }
# NOT RUN {
# Example 4: a NB-1 to estimate a negative binomial with Var(Y) = phi0 * mu
nn <- 200 # Number of observations
phi0 <- 10 # Specify this; should be greater than unity
delta0 <- 1 / (phi0 - 1)
mydata <- data.frame(x2 = runif(nn), x3 = runif(nn))
mydata <- transform(mydata, mu = exp(2 + 3 * x2 + 0 * x3))
mydata <- transform(mydata, y3 = rnbinom(nn, mu = mu, size = delta0 * mu))
# }
# NOT RUN {
plot(y3 ~ x2, data = mydata, pch = "+", col = "blue",
main = paste("Var(Y) = ", phi0, " * mu", sep = ""), las = 1)
# }
# NOT RUN {
nb1 <- vglm(y3 ~ x2 + x3, negbinomial(parallel = TRUE, zero = NULL),
data = mydata, trace = TRUE)
# Extracting out some quantities:
cnb1 <- coef(nb1, matrix = TRUE)
mydiff <- (cnb1["(Intercept)", "loglink(size)"] -
cnb1["(Intercept)", "loglink(mu)"])
delta0.hat <- exp(mydiff)
(phi.hat <- 1 + 1 / delta0.hat) # MLE of phi
summary(nb1)
# Obtain a 95 percent confidence interval for phi0:
myvec <- rbind(-1, 1, 0, 0)
(se.mydiff <- sqrt(t(myvec) %*% vcov(nb1) %*% myvec))
ci.mydiff <- mydiff + c(-1.96, 1.96) * se.mydiff
ci.delta0 <- ci.exp.mydiff <- exp(ci.mydiff)
(ci.phi0 <- 1 + 1 / rev(ci.delta0)) # The 95 percent conf. interval for phi0
Confint.nb1(nb1) # Quick way to get it
summary(glm(y3 ~ x2 + x3, quasipoisson, mydata))$disper # cf. moment estimator
# }
Run the code above in your browser using DataLab