gaitdnbinomial: Generally--Altered, --Inflated, --Truncated and Deflated Negative Binomial Regression

Description

Fits a generally--altered, --inflated --truncated and deflated negative binomial regression by MLE. The GAITD combo model having 7 types of special values is implemented. This allows mixtures of negative binomial distributions on nested and/or partitioned support as well as a multinomial logit model for (nonparametric) altered, inflated and deflated values.

Usage

gaitdnbinomial(a.mix = NULL, i.mix = NULL, d.mix = NULL,
     a.mlm = NULL, i.mlm = NULL, d.mlm = NULL,
     truncate = NULL, zero = c("size", "pobs", "pstr", "pdip"),
     eq.ap = TRUE, eq.ip = TRUE, eq.dp = TRUE,
     parallel.a = FALSE, parallel.i = FALSE, parallel.d = FALSE,
     lmunb.p = "loglink",
     lmunb.a = lmunb.p, lmunb.i = lmunb.p, lmunb.d = lmunb.p,
     lsize.p = "loglink",
     lsize.a = lsize.p, lsize.i = lsize.p, lsize.d = lsize.p,
     type.fitted = c("mean", "munbs", "sizes", "pobs.mlm",
     "pstr.mlm", "pdip.mlm", "pobs.mix", "pstr.mix", "pdip.mix",
     "Pobs.mix", "Pstr.mix", "Pdip.mix", "nonspecial", "Numer",
     "Denom.p", "sum.mlm.i", "sum.mix.i",
     "sum.mlm.d", "sum.mix.d", "ptrunc.p", "cdf.max.s"),
     gpstr.mix = ppoints(7) / 3,
     gpstr.mlm = ppoints(7) / (3 + length(i.mlm)),
     imethod = 1, mux.init = c(0.75, 0.5, 0.75, 0.5),
     imunb.p = NULL, imunb.a = imunb.p,
     imunb.i = imunb.p, imunb.d = imunb.p,
     isize.p = NULL,  isize.a = isize.p,
     isize.i = isize.p, isize.d = isize.p,
     ipobs.mix = NULL, ipstr.mix = NULL,
     ipdip.mix = NULL, ipobs.mlm = NULL,
     ipstr.mlm = NULL, ipdip.mlm = NULL,
     byrow.aid = FALSE, ishrinkage = 0.95, probs.y = 0.35,
     nsimEIM = 500, cutoff.prob = 0.999, eps.trig = 1e-7,
     nbd.max.support = 4000, max.chunk.MB = 30)

Value

An object of class "vglmff"

(see vglmff-class). The object is used by modelling functions such as vglm,

rrvglm

and vgam.

The fitted.values slot of the fitted object, which should be extracted by the generic function fitted, returns the mean \(\mu\) by default. See the information above on type.fitted.

Arguments

truncate: See gaitdpoisson.
a.mix, i.mix, d.mix: See gaitdpoisson.
a.mlm, i.mlm, d.mlm: See gaitdpoisson.
lmunb.p, lmunb.a, lmunb.i, lmunb.d: Link functions pertaining to the mean parameters. See gaitdpoisson where llambda.p etc. are the equivalent.
lsize.p, lsize.a, lsize.i, lsize.d: Link functions pertaining to the size parameters. See NegBinomial.
eq.ap, eq.ip, eq.dp: See gaitdpoisson. These apply to both munb and size parameters simultaneously. See NegBinomial also.
parallel.a, parallel.i, parallel.d: See gaitdpoisson.
type.fitted: See gaitdpoisson.
gpstr.mix, gpstr.mlm: See gaitdpoisson.

imethod, ipobs.mix, ipstr.mix, ipdip.mix: See gaitdpoisson and CommonVGAMffArguments.
ipobs.mlm, ipstr.mlm, ipdip.mlm: See gaitdpoisson.
mux.init: Numeric, of length 4. General downward multiplier for initial values for the sample proportions (MLEs actually). See gaitdpoisson. The fourth value corresponds to size.
imunb.p, imunb.a, imunb.i, imunb.d: See gaitdpoisson; imunb.p is similar to ilambda.p, etc.
isize.p, isize.a, isize.i, isize.d: See gaitdpoisson; isize.p is similar to ilambda.p, etc.
probs.y, ishrinkage: See CommonVGAMffArguments for information.
byrow.aid: Details are at Gaitdpois.
zero: See gaitdpoisson and CommonVGAMffArguments.
nsimEIM, cutoff.prob, eps.trig: See negbinomial.
nbd.max.support, max.chunk.MB: See negbinomial.

Warning

See gaitdpoisson. Also, having eq.ap = TRUE, eq.ip = TRUE and eq.dp = TRUE is often needed to obtain initial values that are good enough because they borrow strength across the different operators. It is usually easy to relax these assumptions later.

This family function is under constant development and future changes will occur.

Author

T. W. Yee

Details

The GAITD--NB combo model is the pinnacle of GAITD regression for counts because it potentially handles underdispersion, equidispersion and overdispersion relative to the Poisson, as well as alteration, inflation, deflation and truncation at arbitrary support points. In contrast, gaitdpoisson cannot handle overdispersion so well. The GAITD--NB is so flexible that it can accommodate up to seven modes.

The full GAITD--NB--NB--MLM--NB-MLM--NB-MLM combo model may be fitted with this family function. There are seven types of special values and all arguments for these may be used in a single model. Here, the MLM represents the nonparametric while the NB refers to the negative binomial mixtures. The defaults for this function correspond to an ordinary negative binomial regression so that negbinomial is called instead.

While much of the documentation here draws upon gaitdpoisson, there are additional details here because the NBD is a two parameter distribution that handles overdispersion relative to the Possion. Consequently, this family function is exceeding flexible and there are many more pitfalls to avoid.

The order of the linear/additive predictors is best explained by an example. Suppose a combo model has length(a.mix) > 3 and length(i.mix) > 3, length(d.mix) > 3, a.mlm = 3:5, i.mlm = 6:9 and d.mlm = 10:12, say. Then loglink(munb.p) and loglink(size.p) are the first two. The third is multilogitlink(pobs.mix) followed by loglink(munb.a) and loglink(size.a) because a.mix is long enough. The sixth is multilogitlink(pstr.mix) followed by loglink(munb.i) and loglink(size.i) because i.mix is long enough. The ninth is multilogitlink(pdip.mix) followed by loglink(munb.d) and loglink(size.d) because d.mix is long enough. Next are the probabilities for the a.mlm values. Then are the probabilities for the i.mlm values. Lastly are the probabilities for the d.mlm values. All the probabilities are estimated by one big MLM and effectively the "(Others)" column of left over probabilities is associated with the nonspecial values. These might be called the nonspecial baseline probabilities (NBP) or reserve probabilities. The dimension of the vector of linear/additive predictors here is \(M=21\).

Apart from the order of the linear/additive predictors, the following are (or should be) equivalent: gaitdnbinomial() and negbinomial(), gaitdnbinomial(a.mix = 0) and zanegbinomial(zero = "pobs0"), gaitdnbinomial(i.mix = 0) and zinegbinomial(zero = "pstr0"), gaitdnbinomial(truncate = 0) and posnegbinomial(). Likewise, if a.mix and i.mix are assigned a scalar then it effectively moves that scalar to a.mlm and i.mlm because there is no parameters such as munb.i being estimated. Thus gaitdnbinomial(a.mix = 0) and gaitdnbinomial(a.mlm = 0) are the effectively same, and ditto for gaitdnbinomial(i.mix = 0) and gaitdnbinomial(i.mlm = 0).

References

Yee, T. W. and Ma, C. (2022). Generally--altered, --inflated, --truncated and --deflated regression, with application to heaped and seeped data. In preparation.

Examples

Run this code

i.mix <- c(5, 10, 12, 16)  # Inflate these values parametrically
i.mlm <- c(14, 15)  # Inflate these values
a.mix <- c(1, 6, 13, 20)  # Alter these values
tvec <- c(3, 11)   # Truncate these values
pstr.mlm <- 0.1  # So parallel.i = TRUE
pobs.mix <- pstr.mix <- 0.1; set.seed(1)
gdata <- data.frame(x2 = runif(nn <- 1000))
gdata <- transform(gdata, munb.p = exp(2 + 0.0 * x2),
                   size.p = exp(1))
gdata <- transform(gdata,
  y1 = rgaitdnbinom(nn, size.p, munb.p, a.mix = a.mix,
                    i.mix = i.mix,
                    pobs.mix = pobs.mix, pstr.mix = pstr.mix,
                    i.mlm = i.mlm, pstr.mlm = pstr.mlm,
                    truncate = tvec))
gaitdnbinomial(a.mix = a.mix, i.mix = i.mix, i.mlm = i.mlm)
with(gdata, table(y1))
fit1 <- vglm(y1 ~ 1, crit = "coef", trace = TRUE, data = gdata,
             gaitdnbinomial(a.mix = a.mix, i.mix = i.mix,
                            i.mlm = i.mlm,
                            parallel.i = TRUE, eq.ap = TRUE,
                            eq.ip = TRUE, truncate = tvec))
head(fitted(fit1, type.fitted = "Pstr.mix"))
head(predict(fit1))
t(coef(fit1, matrix = TRUE))  # Easier to see with t()
summary(fit1)
if (FALSE)  spikeplot(with(gdata, y1), lwd = 2)
plotdgaitd(fit1, new.plot = FALSE, offset.x = 0.2, all.lwd = 2)

Run the code above in your browser using DataLab