gaitpoisson.mix: Generally--Altered, --Inflated and --Truncated Poisson Regression Family Function (GAIT--Pois--Pois--Pois Mixture Variant)

Description

Fits a generally--altered, --inflated and --truncated Poisson regression (mixtures of Poissons on nested and/or partitioned supports). The truncation may include values in the upper tail.

Usage

gaitpoisson.mix(alter = NULL, inflate = NULL, truncate = NULL,
    max.support = Inf, zero = c("pobs.a", "pstr.i"),
    eq.ap = FALSE, eq.ip = FALSE, llambda.p = "loglink",
    lpobs.a = "logitlink", llambda.a = "loglink",
    lpstr.i = "logitlink", llambda.i = "loglink",
    type.fitted = c("mean", "pobs.a", "pstr.i", "Pobs.a", "Pstr.i",
    "prob.a", "prob.i", "prob.t", "lhs.prob"),
    imethod = 1, ilambda.p = NULL, ilambda.a = NULL, ilambda.i = NULL,
    ipobs.a = NULL, ipstr.i = NULL, ishrinkage = 0.95, probs.y = 0.35)

Arguments

alter, inflate, truncate

Vector of altered, inflated and truncated values, i.e., nonnegative integers. A NULL stands for an empty set so the default is effectively equivalent to poissonff. The parameter lambda.p is always estimated. If length(alter) is 1 then the parameter pobs.a is estimated too. If length(inflate) is 1 then the parameter pstr.i is estimated too. If length(alter) is 2 or more then the parameter lambda.a is estimated too, corresponding to an outer distribution. If length(inflate) is 2 or more then the parameter lambda.i is estimated too, corresponding to an outer distribution.

Due to its flexibility, it is easy to misuse this function and ideally the values of these arguments should be well justified by the application on hand. Adding unnecessary values to these arguments willy-nilly is a recipe for disaster, especially for inflate. Using alter effectively removes a subset of the data from the main analysis, therefore may result in a substantial loss of efficiency. For seeped values, alter should be used rather than inflate. Heaped values can be handled by alter and inflate.

llambda.p, llambda.a, llambda.i

Link functions; the suffixes .p, .a and .i refer to the parent, altered and inflated distributions respectively. See Links for more choices and information.

lpobs.a, lpstr.i

Link functions; See Links for more choices and information.

eq.ap, eq.ip

Single logical each. Constrain the rate parameters to be equal? See CommonVGAMffArguments for information. For the GIT--Pois--Pois, after plotting the responses, if the distribution of the spikes above the nominal probabilities has roughly the same shape as the ordinary values then setting eq.ip = TRUE would be a good idea (so that lambda.i == lambda.p). And if inflate is of length 2 or thereabouts, then TRUE should definitely be entertained. Likewise, for heaped or seeped data, setting eq.ap = TRUE (so that lambda.p == lambda.p) would be a good idea for the GAT--Pois--Pois if the shape of the altered probabilities is roughly the same as the (inner) parent distribution.

type.fitted, max.support

See CommonVGAMffArguments and gaitpoisson.mlm for information.

The choice "lhs.prob" is the 1 minus the probability of value greater than "max.support", using the parent distribution.

imethod, ipobs.a, ipstr.i

See CommonVGAMffArguments for information.

ilambda.p, ilambda.a, ilambda.i

See CommonVGAMffArguments for information.

probs.y, ishrinkage

See CommonVGAMffArguments for information.

zero

See CommonVGAMffArguments for information. For the GIT--Pois--Pois, having zero = "pstr.i" will model the mixing probability as simple as possible (intercept-only), hence should be more numerically stable than NULL; and zero = "pstr.i" is recommended for many analyses especially when there are many explanatory variables. Likewise, for the GAT--Pois--Pois, having zero = "pobs.a" will model that probability as simple as possible. The default vector is pruned of any irrelevant values.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, and vgam.

The fitted.values slot of the fitted object, which should be extracted by the generic function fitted, are similar to gaitpoisson.mlm.

Warning

Amateurs have the tendency to be overzealous fitting zero-inflated models when the fitted mean is low---the warning of ziP should be heeded and it applies here to all inflated values.

Fitting a GIT model requires more caution than for the GAT hurdle model because ideally gross inflation is needed in the data for it to work properly. Deflation or no inflation will produce numerical problems such as extreme coefficient values, hence set trace = TRUE to monitor convergence. It is often a good idea to set eq.ip = TRUE, especially when length(inflate) is low or the values of inflate are not spread over the range of the response. That is, if the inflate values form a single small cluster then this can easily create estimation difficulties---the idea is somewhat similar to multicollinearity.

Details

Although the full GAIT--Pois--Pois--Pois model may be fitted, the two submodels that may be fitted can be abbreviated GAT--Pois--Pois or GIT--Pois--Pois, which is where the inner distribution for ordinary values is the Poisson distribution, and the outer distribution for the altered or inflated values is another Poisson distribution with a different rate parameter by default. Thus for the GAT model the distribution being fitted is a (spliced) mixture of two Poissons with differing (partitioned) support. Likewise, for the GIT model the distribution being fitted is a mixture of two Poissons with nested support. The two rate parameters may be constrained to be equal using eq.ap or eq.ip.

For the GIT model, by default, a logistic regression models the (structural) probability pstr.i that the response is inflated.

This function currently does not handle multiple responses. Further details are at Gaitpois. An alternative variant of this distribution, more unstructured in nature, is based on the multinomial logit model---see gaitpoisson.mlm.

For the GIT model, the ordering of the linear/additive predictors corresponds to length(inflate) equalling 0, 1, and more than 1; the dimension grows accordingly. The same idea holds for the GAT model.

Apart from the order of the linear/additive predictors, the following are (or should be) equivalent: gaitpoisson.mix() and poissonff(), gaitpoisson.mix(alter = 0) and zapoisson(zero = "pobs0"), gaitpoisson.mix(inflate = 0) and zipoisson(zero = "pstr0"), gaitpoisson.mix(truncate = 0) and pospoisson().

References

Yee, T. W. and Ma, C. (2020). Generally--altered, --inflated and --truncated regression, with application to heaped and seeped count data. In preparation.

Examples

Run this code

# NOT RUN {
avec <- c(5, 10)  # Alter these values
ivec <- c(3, 15)  # Inflate these values
tvec <- c(6, 7)   # Truncate these values
pobs.a <- logitlink(-1, inverse = TRUE)  # About 0.27
pstr.i <- logitlink(-1, inverse = TRUE)  # About 0.27
max.support <- 20; set.seed(1)
gdata <- data.frame(x2 = runif(nn <- 1000))
gdata <- transform(gdata, lambda.p = exp(2 + 0.5 * x2))
gdata <- transform(gdata,
  y1 = rgaitpois(nn, lambda.p, alter.mix = avec, pobs.mix.a = pobs.a,
                 inflate.mix = ivec, pstr.mix.i = pstr.i,
                 truncate = tvec, max.support = max.support))
gaitpoisson.mix(alter = avec, inflate = ivec)
with(gdata, table(y1))
gaitpxfit <- vglm(y1 ~ x2, crit = "coef", trace = TRUE, data = gdata,
                  gaitpoisson.mix(alter = avec, inflate = ivec,
                                  truncate = tvec, eq.ap = TRUE,
                                  eq.ip = TRUE, max.support = max.support))
head(fitted(gaitpxfit, type.fitted = "Pstr.i"))
head(predict(gaitpxfit))
coef(gaitpxfit, matrix = TRUE)
summary(gaitpxfit)
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

Warning

Details

References

See Also

Examples