Produces a generalized linear model family object with any power variance function and any power link. Includes the Gaussian, Poisson, gamma and inverse-Gaussian families as special cases.
tweedie(var.power = 0, link.power = 1 - var.power)
index of power variance function
index of power link function. link.power=0
produces a log-link. Defaults to the canonical link, which is 1-var.power
.
A family object, which is a list of functions and expressions used by glm
and gam
in their iteratively reweighted least-squares algorithms.
See family
and glm
in the R base help for details.
This function provides access to a range of generalized linear model (GLM) response distributions that are not otherwise provided by R.
It is also useful for accessing distribution/link combinations that are disallowed by the R glm
function.
The variance function for the GLM is assumed to be V(mu) = mu^var.power, where mu is the expected value of the distribution.
The link function of the GLM is assumed to be mu^link.power for non-zero values of link.power or log(mu) for var.power=0.
For example, var.power=1
produces the identity link.
The canonical link for each Tweedie family is link.power = 1 - var.power
.
The Tweedie family of GLMs is discussed in detail by Dunn and Smyth (2018).
Each value of var.power
corresponds to a particular type of response distribution.
The values 0, 1, 2 and 3 correspond to the normal distribution, the Poisson distribution, the gamma distribution and the inverse-Gaussian distribution respectively.
For these choices of var.power
, the Tweedie family is exactly equivalent to the usual GLM famly except with a greater choice of link powers.
For example, tweedie(var.power = 1, link.power = 0)
is exactly equivalent to poisson(link = "log")
.
The most interesting Tweedie families occur for var.power
between 1 and 2.
For these GLMs, the response distribution has mass at zero (i.e., it has exact zeros) but is otherwise continuous on the positive real numbers (Smyth, 1996; Hasan et al, 2012).
These GLMs have been used to model rainfall for example.
Many days there is no rain at all (exact zero) but, if there is any rain, then the actual amount of rain is continuous and positive.
Generally speaking, var.power
should be chosen so that the theoretical response distribution matches the type of response data being modeled.
Hence var.power
should be chosen between 1 and 2 only if the response observations are continuous and positive except for exact zeros and var.power
should be chosen greater than or equal to 2 only if the response observations are continuous and strictly positive.
There are no theoretical Tweedie GLMs with var.power between 0 and 1 (Jorgensen 1987).
The tweedie
function will work for those values but the family should be interpreted in a quasi-likelihood sense.
Theoretical Tweedie GLMs do exist for negative values of var.power, but they are of little practical application.
These distributions assume
The tweedie
function will work for those values but the family should be interpreted in a quasi-likelihood sense.
The name Tweedie has been associated with this family by Joergensen (1987) in honour of M. C. K. Tweedie. Joergensen (1987) gives a mathematical derivation of the Tweedie distributions proving that no distributions exist for var.power between 0 and 1.
Mathematically, a Tweedie GLM assumes the following. Let \(\mu_i = E(y_i)\) be the expectation of the \(i\)th response. We assume that $$\mu_i^q = x_i^Tb, var(y_i) = \phi \mu_i^p$$
where \(x_i\) is a vector of covariates and b is a vector of regression cofficients, for some \(\phi\), \(p\) and \(q\).
This family is specified by var.power = p
and link.power = q
.
A value of zero for \(q\) is interpreted as \(\log(\mu_i) = x_i^Tb\).
The following table summarizes the possible Tweedie response distributions:
var.power | Response distribution |
0 | Normal |
1 | Poisson |
(1, 2) | Compound Poisson, non-negative with mass at zero |
2 | Gamma |
3 | Inverse-Gaussian |
Dunn, P. K., and Smyth, G. K, (2018). Generalized linear models with examples in R. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-0118-7 (Chapter 12 gives an overall discussion of Tweedie GLMs with R code and case studies.)
Hasan, M.M. and Dunn, P.K. (2012). Understanding the effect of climatology on monthly rainfall amounts in Australia using Tweedie GLMs. International Journal of Climatology, 32(7) 1006-1017. (An example with var.power between 1 and 2)
Joergensen, B. (1987). Exponential dispersion models. J. R. Statist. Soc. B 49, 127-162. (Mathematical derivation of Tweedie response distributions)
Tweedie, M. C. K. (1984). An index which distinguishes between some important exponential families. In Statistics: Applications and New Directions. Proceedings of the Indian Statistical Institute Golden Jubilee International Conference. (Eds. J. K. Ghosh and J. Roy), pp. 579-604. Calcutta: Indian Statistical Institute. (The original mathematical paper from which the family is named)
Smyth, G. K. (1996). Regression modelling of quantity data with exact zeroes. Proceedings of the Second Australia-Japan Workshop on Stochastic Models in Engineering, Technology and Management. Technology Management Centre, University of Queensland, pp. 572-580. http://www.statsci.org/smyth/pubs/RegressionWithExactZerosPreprint.pdf (Derivation and examples of Tweedie GLMS with var.power between 0 and 1)
Smyth, G. K., and Verbyla, A. P., (1999). Adjusted likelihood methods for modelling dispersion in generalized linear models. Environmetrics 10, 695-709.
http://www.statsci.org/smyth/pubs/Ties98-Preprint.pdf
(Includes examples of Tweedie GLMs with var.power=2
and var.power=4
)
# NOT RUN {
y <- rgamma(20,shape=5)
x <- 1:20
# Fit a poisson generalized linear model with identity link
glm(y~x,family=tweedie(var.power=1,link.power=1))
# Fit an inverse-Gaussion glm with log-link
glm(y~x,family=tweedie(var.power=3,link.power=0))
# }
Run the code above in your browser using DataLab