
The gam
modelling function is designed to be able to use
the negbin
family (a modification of MASS library negative.binomial
family
by Venables and Ripley), or the nb
function designed for integrated estimation of
parameter theta
.
Two approaches to estimating theta
are available (with gam
only):
With negbin
then if `performance iteration' is used for smoothing parameter estimation
(see gam
), then smoothing parameters are chosen by GCV and
theta
is chosen in order to ensure that the Pearson estimate of the scale
parameter is as close as possible to 1, the value that the scale parameter should have.
If `outer iteration' is used for smoothing parameter selection with the nb
family then
theta
is estimated alongside the smoothing parameters by ML or REML.
To use the first option, set the optimizer
argument of gam
to "perf"
(it can sometimes fail to converge).
negbin(theta = stop("'theta' must be specified"), link = "log")
nb(theta = NULL, link = "log")
Either i) a single value known value of theta or ii) two values of theta specifying the
endpoints of an interval over which to search for theta (this is an option only for negbin
, and is deprecated).
For nb
then a positive supplied theta
is treated as a fixed known parameter, otherwise
it is estimated (the absolute value of a negative theta
is taken as a starting value).
The link function: one of "log"
, "identity"
or "sqrt"
For negbin
an object inheriting from class family
, with additional elements
the function giving the first derivative of the variance function w.r.t. mu
.
the function giving the second derivative of the variance function w.r.t. mu
.
A function for retrieving the value(s) of theta. This also useful for retriving the
estimate of theta
after fitting (see example).
For nb an object inheriting from class extended.family.
gamm
does not support theta
estimation
The negative binomial functions from the MASS library are no longer supported.
nb
allows estimation of the theta
parameter alongside the model smoothing parameters, but is only usable with gam
or bam
(not gamm
).
For negbin
, if a single value of theta
is supplied then it is always taken as the known fixed value and this is useable with bam
and gamm
. If theta
is two
numbers (theta[2]>theta[1]
) then they are taken as specifying the range of values over which to search for
the optimal theta. This option is deprecated and should only be used with performance iteration estimation (see gam
argument optimizer
), in which case the method
of estimation is to choose
Venables, B. and B.R. Ripley (2002) Modern Applied Statistics in S, Springer.
Wood, S.N., N. Pya and B. Saefken (2016), Smoothing parameter and model selection for general smooth models. Journal of the American Statistical Association 111, 1548-1575 http://dx.doi.org/10.1080/01621459.2016.1180986
# NOT RUN {
library(mgcv)
set.seed(3)
n<-400
dat <- gamSim(1,n=n)
g <- exp(dat$f/5)
## negative binomial data...
dat$y <- rnbinom(g,size=3,mu=g)
## known theta fit ...
b0 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=negbin(3),data=dat)
plot(b0,pages=1)
print(b0)
## same with theta estimation...
b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=nb(),data=dat)
plot(b,pages=1)
print(b)
b$family$getTheta(TRUE) ## extract final theta estimate
## another example...
set.seed(1)
f <- dat$f
f <- f - min(f)+5;g <- f^2/10
dat$y <- rnbinom(g,size=3,mu=g)
b2 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=nb(link="sqrt"),
data=dat,method="REML")
plot(b2,pages=1)
print(b2)
rm(dat)
# }
Run the code above in your browser using DataLab