gaitpoisson.mlm: Generally--Altered, --Inflated and --Truncated Poisson Regression Family Function (GAIT--Pois--MLM--MLM Variant)

Description

Fits a generally--altered, --inflated and --truncated Poisson regression (using a multinomial logit model for the altered or inflated values). The truncation may include values in the upper tail. Currently only one of altered or inflated are allowed in a single model.

Usage

gaitpoisson.mlm(alter = NULL, inflate = NULL,
    truncate = NULL, max.support = Inf,
    zero = c("pobs", "pstr"), parallel.ap = FALSE, parallel.ip = FALSE,
    llambda = "loglink", type.fitted = c("mean", "pobs.a", "pstr.i",
    "Pobs.a", "Pstr.i", "prob.a", "prob.i", "prob.t", "lhs.prob"),
    imethod = 1, mux.inflate = 0.5, ilambda = NULL,
    ishrinkage = 0.95, probs.y = 0.35)

Arguments

alter, inflate, truncate

Vector of altered, inflated and truncated values, i.e., nonnegative integers. A NULL stands for an empty set so the default is effectively equivalent to poissonff. Currently alter and inflate cannot both be used at the same time. If length(alter) is 1 or more then the multinomial logit model (MLM) probabilities of observing those values are modelled---see multinomial. If length(alter) is 1 then effectively a logistic regression is estimated as a special case. Likewise, if length(inflate) is 1 or more then the MLM structural probabilities are modelled---see multinomial. And if length(inflate) is 1 then effectively a logistic regression is estimated as a special case.

See gaitpoisson.mix for more information about these arguments.

llambda

Link function for the parent distribution. See Links for more choices and information.

parallel.ap, parallel.ip

Single logical each. Constrain the MLM probabilities to be equal? If so then this applies to all length(alter) pobs probabilities or all length(inflate) pstr probabilities. See CommonVGAMffArguments for information. The default means that the probabilities are generally unconstrained and unstructured.

type.fitted

See CommonVGAMffArguments and below for information. Choosing an irrelevant value may result in an obscure error message, e.g., "Pstr.i" for a GAT model.

The choice "lhs.prob" is the 1 minus the probability of value greater than "max.support", using the parent distribution.

max.support

See Gaitpois for information. This enables RHS-truncation, i.e., something equivalent to truncate = (U+1):Inf for some upper support point U.

imethod, ilambda, mux.inflate

See CommonVGAMffArguments for information.

probs.y, ishrinkage

See CommonVGAMffArguments for information.

zero

See CommonVGAMffArguments for information. For the GIT--Pois--MLM submodel, having zero = "pstr" will model the mixing probabilities as simple as possible (intercept-only), hence should be more numerically stable than NULL; and zero = "pstr" is recommended for many analyses especially when there are many explanatory variables. Likewise, for the GAT--Pois--MLM submodel, setting zero = "pobs" will model the multinomial probabilities as simple as possible (intercept-only), hence should be more numerically stable than the default, and this is recommended for many analyses especially when there are many explanatory variables. The default vector is pruned of irrelevant values.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, and vgam.

The fitted.values slot of the fitted object, which should be extracted by the generic function fitted, is similar to gaitpoisson.mix. The default is to return the mean \(\mu\). The choice type.fitted = "pobs.a" returns a matrix of all the altered probabilities (Greek symbol omega); each column is an omega, and the last column is labelled "(Others)". The choice type.fitted = "Pobs.a" returns the same.

The choice "pstr.i" returns a matrix of all the inflated probabilities (Greek symbol phi); each column is a phi, and the last column is labelled "(Others)". The choice "Pstr.i" returns a length(inflate)-column matrix with probabilities coming from the two sources.

The choice "prob.a" returns the sum of the probability of having an altered value, given the estimate of lambda from the parent distribution; i.e., it is summed over alter and evaluated at the parent PMF. The choice "prob.i" is similar to "prob.a" but summed over inflate. Likewise, the choice "prob.t" summed over truncate.

Warning

Using inflate requires more caution than alter because gross inflation is ideally needed for it to work safely. Deflation or no inflation will produce numerical problems, hence set trace = TRUE to monitor convergence.

See gaitpoisson.mix for other general details and warnings.

Details

The full distribution being considered can be abbreviated GAIT--Pois--MLM--MLM, which is where the inner distribution for ordinary values is the Poisson distribution, and the outer distributions for the altered and inflated values are unstructured probabilities. Thus the distribution being fitted is a mixture of a Poisson and two MLMs with the support of one of the MLMs being equal to the set of altered values and the other for the inflated values. Hence the probability for each inflated value comes from two sources: the parent distribution and a MLM.

The two submodels that may be fitted can be abbreviated GAT--Pois--MLM or GIT--Pois--MLM. An alternative variant of this distribution, more structured in nature, replaces the MLMs by Poisson distributions with potentially different means---see gaitpoisson.mix. This family function may be described as nonparametric while gaitpoisson.mix is parametric.

For the GIT model, the ordering of the linear/additive predictors corresponds to length(inflate) equalling 0 and more than 0; the dimension grows accordingly. The same idea holds for the GAT model.

This function currently does not handle multiple responses. Further details are at Gaitpois.

Apart from the order of the linear/additive predictors, the following are (or should be) equivalent: gaitpoisson.mlm() and poissonff(), gaitpoisson.mlm(alter = 0) and zapoisson(zero = "pobs0"), gaitpoisson.mlm(inflate = 0) and zipoisson(zero = "pstr0"), gaitpoisson.mlm(truncate = 0) and pospoisson().

References

Yee, T. W. and Ma, C. (2020). Generally--altered, --inflated and --truncated regression, with application to heaped and seeped count data. In preparation.

Examples

Run this code

# NOT RUN {
ivec <- c(3, 4)  # Inflate these values
tvec <- c(5, 7)  # Truncate these values
max.support <- 20  # Values beyond here are truncated
pstr.i <- 0.1  # Common value for all values of ivec
gdata <- data.frame(x2 = runif(nn <- 1000))
gdata <- transform(gdata, lambda = exp(1.5 + 0.5 * x2))
gdata <- transform(gdata,
  y1 = rgaitpois(nn, lambda, pstr.mlm.i = pstr.i, inflate.mlm = ivec,
                 truncate = tvec, max.support = max.support))
gaitpoisson.mlm(inflate = ivec)
with(gdata, table(y1))
fit1 <- vglm(y1 ~ x2, crit = "coef", trace = TRUE, data = gdata,
             gaitpoisson.mlm(inflate = ivec, truncate = tvec,
                             parallel.ip = TRUE,  # Good idea here
                             max.support = max.support))
head(fitted(fit1, type.fitted = "Pstr.i"))
head(predict(fit1))
log(pstr.i / (1 - length(ivec) * pstr.i))  # Value of some of the etas
coef(fit1, matrix = TRUE)
summary(fit1)
# }

Run the code above in your browser using DataLab