Learn R Programming

mgcv (version 1.8-40)

betar: GAM beta regression family

Description

Family for use with gam or bam, implementing regression for beta distributed data on (0,1). A linear predictor controls the mean, \(\mu\) of the beta distribution, while the variance is then \(\mu(1-\mu)/(1+\phi)\), with parameter \(\phi\) being estimated during fitting, alongside the smoothing parameters.

Usage

betar(theta = NULL, link = "logit",eps=.Machine$double.eps*100)

Value

An object of class extended.family.

Arguments

theta

the extra parameter (\(\phi\) above).

link

The link function: one of "logit", "probit", "cloglog" and "cauchit".

eps

the response variable will be truncated to the interval [eps,1-eps] if there are values outside this range. This truncation is not entirely benign, but too small a value of eps will cause stability problems if there are zeroes or ones in the response.

Author

Natalya Pya (nat.pya@gmail.com) and Simon Wood (s.wood@r-project.org)

WARNINGS

Do read the details section if your data contain 0s and or 1s.

Details

These models are useful for proportions data which can not be modelled as binomial. Note the assumption that data are in (0,1), despite the fact that for some parameter values 0 and 1 are perfectly legitimate observations. The restriction is needed to keep the log likelihood bounded for all parameter values. Any data exactly at 0 or 1 are reset to be just above 0 or just below 1 using the eps argument (in fact any observation <eps is reset to eps and any observation >1-eps is reset to 1-eps). Note the effect of this resetting. If \(\mu\phi>1\) then impossible 0s are replaced with highly improbable eps values. If the inequality is reversed then 0s with infinite probability density are replaced with eps values having high finite probability density. The equivalent condition for 1s is \((1-\mu)\phi>1\). Clearly all types of resetting are somewhat unsatisfactory, and care is needed if data contain 0s or 1s (often it makes sense to manually reset the 0s and 1s in a manner that somehow reflects the sampling setup).

Examples

Run this code
library(mgcv)
## Simulate some beta data...
set.seed(3);n<-400
dat <- gamSim(1,n=n)
mu <- binomial()$linkinv(dat$f/4-2)
phi <- .5
a <- mu*phi;b <- phi - a;
dat$y <- rbeta(n,a,b) 

bm <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=betar(link="logit"),data=dat)

bm
plot(bm,pages=1)

Run the code above in your browser using DataLab