Learn R Programming

statmod (version 1.5.0)

tweedie: Tweedie Generalized Linear Models

Description

Produces a generalized linear model family object with any power variance function and any power link. Includes the Gaussian, Poisson, gamma and inverse-Gaussian families as special cases.

Usage

tweedie(var.power = 0, link.power = 1 - var.power)

Value

A family object, which is a list of functions and expressions used by glm and gam in their iteratively reweighted least-squares algorithms. See family and glm in the R base help for details.

Arguments

var.power

index of power variance function

link.power

index of power link function. link.power=0 produces a log-link. Defaults to the canonical link, which is 1-var.power.

Author

Gordon Smyth

Details

This function provides access to a range of generalized linear model (GLM) response distributions that are not otherwise provided by R. It is also useful for accessing distribution/link combinations that are disallowed by the R glm function. The variance function for the GLM is assumed to be V(mu) = mu^var.power, where mu is the expected value of the distribution. The link function of the GLM is assumed to be mu^link.power for non-zero values of link.power or log(mu) for var.power=0. For example, var.power=1 produces the identity link. The canonical link for each Tweedie family is link.power = 1 - var.power.

The Tweedie family of GLMs is discussed in detail by Dunn and Smyth (2018). Each value of var.power corresponds to a particular type of response distribution. The values 0, 1, 2 and 3 correspond to the normal distribution, the Poisson distribution, the gamma distribution and the inverse-Gaussian distribution respectively. For these choices of var.power, the Tweedie family is exactly equivalent to the usual GLM famly except with a greater choice of link powers. For example, tweedie(var.power = 1, link.power = 0) is exactly equivalent to poisson(link = "log").

The most interesting Tweedie families occur for var.power between 1 and 2. For these GLMs, the response distribution has mass at zero (i.e., it has exact zeros) but is otherwise continuous on the positive real numbers (Smyth, 1996; Hasan et al, 2012). These GLMs have been used to model rainfall for example. Many days there is no rain at all (exact zero) but, if there is any rain, then the actual amount of rain is continuous and positive.

Generally speaking, var.power should be chosen so that the theoretical response distribution matches the type of response data being modeled. Hence var.power should be chosen between 1 and 2 only if the response observations are continuous and positive except for exact zeros and var.power should be chosen greater than or equal to 2 only if the response observations are continuous and strictly positive.

There are no theoretical Tweedie GLMs with var.power between 0 and 1 (Jorgensen 1987). The tweedie function will work for those values but the family should be interpreted in a quasi-likelihood sense.

Theoretical Tweedie GLMs do exist for negative values of var.power, but they are of little practical application. These distributions assume The tweedie function will work for those values but the family should be interpreted in a quasi-likelihood sense.

The name Tweedie has been associated with this family by Joergensen (1987) in honour of M. C. K. Tweedie. Joergensen (1987) gives a mathematical derivation of the Tweedie distributions proving that no distributions exist for var.power between 0 and 1.

Mathematically, a Tweedie GLM assumes the following. Let \(\mu_i = E(y_i)\) be the expectation of the \(i\)th response. We assume that $$\mu_i^q = x_i^Tb, var(y_i) = \phi \mu_i^p$$

where \(x_i\) is a vector of covariates and b is a vector of regression cofficients, for some \(\phi\), \(p\) and \(q\). This family is specified by var.power = p and link.power = q. A value of zero for \(q\) is interpreted as \(\log(\mu_i) = x_i^Tb\).

The following table summarizes the possible Tweedie response distributions:

var.powerResponse distribution
0Normal
1Poisson
(1, 2)Compound Poisson, non-negative with mass at zero
2Gamma
3Inverse-Gaussian
> 2Stable, with support on the positive reals

References

Dunn, P. K., and Smyth, G. K, (2018). Generalized linear models with examples in R. Springer, New York, NY. tools:::Rd_expr_doi("10.1007/978-1-4419-0118-7") (Chapter 12 gives an overall discussion of Tweedie GLMs with R code and case studies.)

Hasan, M.M. and Dunn, P.K. (2012). Understanding the effect of climatology on monthly rainfall amounts in Australia using Tweedie GLMs. International Journal of Climatology, 32(7) 1006-1017. (An example with var.power between 1 and 2)

Joergensen, B. (1987). Exponential dispersion models. J. R. Statist. Soc. B 49, 127-162. (Mathematical derivation of Tweedie response distributions)

Tweedie, M. C. K. (1984). An index which distinguishes between some important exponential families. In Statistics: Applications and New Directions. Proceedings of the Indian Statistical Institute Golden Jubilee International Conference. (Eds. J. K. Ghosh and J. Roy), pp. 579-604. Calcutta: Indian Statistical Institute. (The original mathematical paper from which the family is named)

Smyth, G. K. (1996). Regression modelling of quantity data with exact zeroes. Proceedings of the Second Australia-Japan Workshop on Stochastic Models in Engineering, Technology and Management. Technology Management Centre, University of Queensland, pp. 572-580. http://www.statsci.org/smyth/pubs/RegressionWithExactZerosPreprint.pdf (Derivation and examples of Tweedie GLMS with var.power between 0 and 1)

Smyth, G. K., and Verbyla, A. P., (1999). Adjusted likelihood methods for modelling dispersion in generalized linear models. Environmetrics 10, 695-709. http://www.statsci.org/smyth/pubs/Ties98-Preprint.pdf (Includes examples of Tweedie GLMs with var.power=2 and var.power=4)

See Also

glm, family, dtweedie

Examples

Run this code
y <- rgamma(20,shape=5)
x <- 1:20
# Fit a poisson generalized linear model with identity link
glm(y~x,family=tweedie(var.power=1,link.power=1))

# Fit an inverse-Gaussion glm with log-link
glm(y~x,family=tweedie(var.power=3,link.power=0)) 

Run the code above in your browser using DataLab