COMPoisson: Conway-Maxwell-Poisson (COM-Poisson) GLM family

Description

The COM-Poisson family is a generalization of the Poisson family which can describe over-dispersed as well as under-dispersed count data. It is indexed by a parameter nu that quantifies such dispersion. For nu>1, the distribution is under-dispersed relative to the Poisson distribution with same mean. It includes the Poisson, geometric and Bernoulli as special (or limit) cases (see Details). The COM-Poisson family is here implemented as a family object, so that it can be fitted by glm, and further used to model conditional responses in mixed models fitted by this package's functions (see Examples). nu is distinct from the dispersion parameter \(\nu=1/\phi\) considered elsewhere in this package and in the GLM literature, as \(\nu\) affects in a more specific way the log-likelihood.

Several links are now allowed for this family, corresponding to different versions of the COMPoisson described in the literature (e.g., Sellers & Shmueli 2010; Huang 2017).

Usage

COMPoisson(nu =  stop("COMPoisson's 'nu' must be specified"), 
           link = "loglambda")

Value

A family object.

Arguments

link: GLM link function. The default is the canonical link "loglambda" (see Details), but other links are allowed (currently log, sqrt or identity links as commonly handled for the Poisson family).
nu: Under-dispersion parameter. The fitme and corrHLfit functions called with family=COMPoisson() (no given nu value) will estimate this parameter. In other usage of this family, nu must be specified. COMPoisson(nu=1) is the Poisson family.

Details

The \(i\)th term of the distribution can be written \(q_i/Z\) where \(q_i=\lambda^i / (i!)^\nu\) and \(Z=\sum_{(i=0)}^\infty q_i\), for \(\lambda=\lambda(\mu)\) implied by its inverse relationship, the expectation formula \(\mu=\mu(\lambda)=\sum_{(i=0)}^\infty i q_i(\lambda)/Z\). The case nu=0 is the geometric distribution with parameter \(\lambda\); nu=1 is the Poisson distribution with mean \(\lambda\); and the limit as nu -> \(\infty\) is the Bernoulli distribution with expectation \(\lambda/(1+\lambda)\).

From this definition, this is an exponential family model with canonical parameters \(log(\lambda)\) and \(\nu\). When the linear predictor \(\eta\) specifies \(log(\lambda(\mu))\), the canonical link is used (e.g., Sellers & Shmueli 2010). It is here nicknamed "loglambda" and does not have a known expression in terms of elementary functions. To obtain \(\mu\) as the link inverse of the linear predictor \(\eta\), one then first computes \(\lambda=e^\eta\) and then \(\mu(\lambda)\) by the expectation formula. For other links (Huang 2017), one directly computes \(\mu\) by the link inverse (e.g., \(\mu=e^\eta\) for link "log"), and then one may solve for \(\lambda= \lambda(\mu)\) to obtain other features of the distribution.

The relationships between \(\lambda\) and \(\mu\) or other moments of the distribution involve infinite summations. These sums can be easily approximated by a finite number of terms for large nu but not when nu approaches zero. For this reason, the code may fail to fit distributions with nu approaching 0 (strong residual over-dispersion). The case nu=0 (the geometric distribution) is fitted by an ad hoc algorithm devoid of such problems. Otherwise, spaMM truncates the sum, and uses numerical integrals to approximate missing terms (which slows down the fitting operation). In addition, it applies an ad hoc continuity correction to ensure continuity of the result in nu=1 (Poisson case). These corrections affect numerical results for the case of residual overdispersion but are negligible for the case of residual underdispersion. Alternatively, spaMM uses Gaunt et al.'s (2017) approximations when the condition defined by spaMM.getOption("CMP_asympto_cond") is satisfied. All approximations reduces the accuracy of computations, in a way that can impede the extended Levenberg-Marquardt algorithm sometimes needed by spaMM.

The name COMP_nu should be used to set initial values or bounds on nu in control arguments of the fitting functions (e.g., fitme(.,init=list(COMP_nu=1))). Fixed values should be set by the family argument (COMPoisson(nu=.)).

References

Gaunt, Robert E. and Iyengar, Satish and Olde Daalhuis, Adri B. and Simsek, Burcin. (2017) An asymptotic expansion for the normalizing constant of the Conway--Maxwell--Poisson distribution. Ann Inst Stat Math tools:::Rd_expr_doi("10.1007/s10463-017-0629-6").

Huang, Alan (2017) Mean-parametrized Conway-Maxwell-Poisson regression models for dispersed counts. Stat. Modelling tools:::Rd_expr_doi("10.1177/1471082X17697749")

G. Shmueli, T. P. Minka, J. B. Kadane, S. Borle and P. Boatwright (2005) A useful distribution for fitting discrete data: revival of the Conway-Maxwell-Poisson distribution. Appl. Statist. 54: 127-142.

Sellers KF, Shmueli G (2010) A Flexible Regression Model for Count Data. Ann. Appl. Stat. 4: 943–961

Examples

Run this code

if (spaMM.getOption("example_maxtime")>0.9) {
  # Fitting COMPoisson model with estimated nu parameter:
  #
  data("freight") ## example from Sellers & Shmueli, Ann. Appl. Stat. 4: 943–961 (2010)
  fitme(broken ~ transfers, data=freight, family = COMPoisson())
  fitme(broken ~ transfers, data=freight, family = COMPoisson(link="log"))

  # glm(), HLCor() and HLfit() handle spaMM::COMPoisson() with fixed overdispersion:
  #
  glm(broken ~ transfers, data=freight, family = COMPoisson(nu=10))
  HLfit(broken ~ transfers+(1|id), data=freight, family = COMPoisson(nu=10),method="ML")
  
  # Equivalence of poisson() and COMPoisson(nu=1):
  #
  COMPglm <- glm(broken ~ transfers, data=freight, family = poisson())
  coef(COMPglm)
  logLik(COMPglm)
  COMPglm <- glm(broken ~ transfers, data=freight, family = COMPoisson(nu=1))
  coef(COMPglm)
  logLik(COMPglm)
  HLfit(broken ~ transfers, data=freight, family = COMPoisson(nu=1))

}

Run the code above in your browser using DataLab