Fits a generally--altered, --inflated and --truncated Poisson regression (using a multinomial logit model for the altered or inflated values). The truncation may include values in the upper tail. Currently only one of altered or inflated are allowed in a single model.
gaitpoisson.mlm(alter = NULL, inflate = NULL,
truncate = NULL, max.support = Inf,
zero = c("pobs", "pstr"), parallel.ap = FALSE, parallel.ip = FALSE,
llambda = "loglink", type.fitted = c("mean", "pobs.a", "pstr.i",
"Pobs.a", "Pstr.i", "prob.a", "prob.i", "prob.t", "lhs.prob"),
imethod = 1, mux.inflate = 0.5, ilambda = NULL,
ishrinkage = 0.95, probs.y = 0.35)
Vector of altered, inflated and truncated values,
i.e., nonnegative integers.
A NULL
stands for an empty set so the default is
effectively equivalent to poissonff
.
Currently alter
and inflate
cannot both be used at the same time.
If length(alter)
is 1 or more then the multinomial
logit model (MLM) probabilities of observing those
values are modelled---see
multinomial
.
If length(alter)
is 1 then effectively a logistic
regression is estimated as a special case.
Likewise,
if length(inflate)
is 1 or more then the
MLM structural probabilities are modelled---see
multinomial
.
And if length(inflate)
is 1 then effectively a logistic
regression is estimated as a special case.
See gaitpoisson.mix
for more information about
these arguments.
Link function for the parent distribution.
See Links
for more choices and information.
Single logical each.
Constrain the MLM probabilities to be equal?
If so then this applies to all
length(alter)
pobs
probabilities
or all
length(inflate)
pstr
probabilities.
See CommonVGAMffArguments
for information.
The default means that the probabilities are generally
unconstrained and unstructured.
See CommonVGAMffArguments
and
below
for information.
Choosing an irrelevant value may result in
an obscure error message, e.g.,
"Pstr.i"
for a GAT model.
The choice "lhs.prob"
is the 1 minus the
probability of value greater than "max.support"
,
using the parent distribution.
See Gaitpois
for information.
This enables RHS-truncation,
i.e., something equivalent to
truncate = (U+1):Inf
for some upper support point U
.
See CommonVGAMffArguments
for information.
See CommonVGAMffArguments
for information.
See CommonVGAMffArguments
for information.
For the GIT--Pois--MLM submodel,
having zero = "pstr"
will model the mixing
probabilities as simple as possible (intercept-only), hence
should be more numerically stable than NULL
; and
zero = "pstr"
is recommended for many analyses
especially when there are many explanatory variables.
Likewise, for the GAT--Pois--MLM submodel,
setting zero = "pobs"
will model the multinomial
probabilities as simple as possible (intercept-only), hence
should be more numerically stable than the default, and
this is recommended for many analyses especially when there
are many explanatory variables.
The default vector is pruned of irrelevant values.
An object of class "vglmff"
(see vglmff-class
).
The object is used by modelling functions such as vglm
,
and vgam
.
The fitted.values
slot of the fitted object,
which should be extracted by the generic function fitted
,
is similar to gaitpoisson.mix
.
The default is to
return the mean \(\mu\).
The choice type.fitted = "pobs.a"
returns
a matrix of all
the altered probabilities (Greek symbol omega);
each column is an omega, and the last column is labelled
"(Others)"
.
The choice type.fitted = "Pobs.a"
returns the same.
The choice "pstr.i"
returns
a matrix of all
the inflated probabilities (Greek symbol phi);
each column is a phi, and the last column is labelled
"(Others)"
.
The choice "Pstr.i"
returns a length(inflate)
-column
matrix with probabilities coming from the two sources.
The choice "prob.a"
returns the sum of the
probability of having
an altered value, given the estimate of lambda
from
the parent distribution; i.e., it is summed over alter
and evaluated at the parent PMF.
The choice "prob.i"
is similar to "prob.a"
but
summed over inflate
.
Likewise,
the choice "prob.t"
summed over truncate
.
Using inflate
requires more caution than alter
because gross inflation is ideally needed for it to work safely.
Deflation or no inflation will produce numerical problems,
hence set trace = TRUE
to monitor convergence.
See gaitpoisson.mix
for other general details
and warnings.
The full distribution being considered can be abbreviated GAIT--Pois--MLM--MLM, which is where the inner distribution for ordinary values is the Poisson distribution, and the outer distributions for the altered and inflated values are unstructured probabilities. Thus the distribution being fitted is a mixture of a Poisson and two MLMs with the support of one of the MLMs being equal to the set of altered values and the other for the inflated values. Hence the probability for each inflated value comes from two sources: the parent distribution and a MLM.
The two submodels that may be fitted can be abbreviated
GAT--Pois--MLM or
GIT--Pois--MLM.
An alternative variant of this distribution,
more structured in nature,
replaces the MLMs by Poisson distributions with potentially
different means---see gaitpoisson.mix
.
This family function may be described as nonparametric while
gaitpoisson.mix
is parametric.
For the GIT model,
the ordering of the linear/additive predictors corresponds to
length(inflate)
equalling 0 and more than 0;
the dimension grows accordingly.
The same idea holds for the GAT model.
This function currently does not handle multiple responses.
Further details are at Gaitpois
.
Apart from the order of the linear/additive predictors,
the following are (or should be) equivalent:
gaitpoisson.mlm()
and poissonff()
,
gaitpoisson.mlm(alter = 0)
and zapoisson(zero = "pobs0")
,
gaitpoisson.mlm(inflate = 0)
and zipoisson(zero = "pstr0")
,
gaitpoisson.mlm(truncate = 0)
and pospoisson()
.
Yee, T. W. and Ma, C. (2020). Generally--altered, --inflated and --truncated regression, with application to heaped and seeped count data. In preparation.
Gaitpois
,
gaitpoisson.mix
,
gatnbinomial.mlm
,
zipoisson
,
pospoisson
,
CommonVGAMffArguments
,
rootogram4
,
simulate.vlm
.
# NOT RUN {
ivec <- c(3, 4) # Inflate these values
tvec <- c(5, 7) # Truncate these values
max.support <- 20 # Values beyond here are truncated
pstr.i <- 0.1 # Common value for all values of ivec
gdata <- data.frame(x2 = runif(nn <- 1000))
gdata <- transform(gdata, lambda = exp(1.5 + 0.5 * x2))
gdata <- transform(gdata,
y1 = rgaitpois(nn, lambda, pstr.mlm.i = pstr.i, inflate.mlm = ivec,
truncate = tvec, max.support = max.support))
gaitpoisson.mlm(inflate = ivec)
with(gdata, table(y1))
fit1 <- vglm(y1 ~ x2, crit = "coef", trace = TRUE, data = gdata,
gaitpoisson.mlm(inflate = ivec, truncate = tvec,
parallel.ip = TRUE, # Good idea here
max.support = max.support))
head(fitted(fit1, type.fitted = "Pstr.i"))
head(predict(fit1))
log(pstr.i / (1 - length(ivec) * pstr.i)) # Value of some of the etas
coef(fit1, matrix = TRUE)
summary(fit1)
# }
Run the code above in your browser using DataLab