gam()
is not a clone of what Splus provides.
Smooth terms are represented using penalized regression splines
with smoothing parameters selected by GCV/UBRE or by regression splines with
fixed degrees of freedom (mixtures of the two are
permitted). Multi-dimensional smooths are available using penalized thin plate
regression splines, but the user must make sure that covariates are sensibly scaled
relative to each other when using such terms. For a general overview see
Wood (2001). For more on specifying models see gam.models
. For more on model
selection see gam.selection
.gam(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,
na.action,control=gam.control(),scale=0,knots=NULL,sp=NULL,
min.sp=NULL,H=NULL,gamma=1,...)
gam.models
). This is exactly like the formula for a
glm except that smooth terms can be added to the right hand side of the
formula (and a formula of the form y ~ .
ienvironment(formula)
, typically the environment from
which gam
is called.gam.control
.cr
basis the user simply supplies the knots to be used, and there must be the same number as the basis
dimension, k
, for t"magic"
(see
"magic"
only, lower bounds can be
supplied for the smoothing parameters. Note that if this option is used then
the smoothing parameters, sp
, in the returned object will need to be added to
what is supplied here to"magic"
a user supplied fixed quadratic
penalty on the parameters of the
GAM can be supplied, with this as its coefficient matrix. A common use of this term is
to add a ridge penalty to the parameters of the GAM in circumst"magic"
.gam.fit
"gam"
which has the following elements:by
variables (i.e. covariates that multiply a
smooth term) by[i,j]
is the jth value for the ith by
variable. There are only as many rows of this array as there are
by
variables in the model (often 0). The rownames of by
give the by
variable names.by.exists[i]
is
TRUE
if the ith smooth has a by
variable associated with
it, FALSE
otherwise.update
to be used with gam
objects, for example)."cr"
basis case
this is the number of knots used. For the "tp"
basis it is the rank of the smooth (one greater than the
maximum degrees of freedom because of the centering constraint)."magic"
for the new more stable method, "mgcv"
for the Wood (2000) method."magic"
and "mgcv"
. Here is
the "mgcv"
version:g
above - i.e. the leading diagonal of the Hessian.}
TRUE
if the second smoothing parameter guess improved the GCV/UBRE score.}
TRUE
if the algorithm terminated by failing to improve the GCV/UBRE score rather than by "converging".
Not necessarily a problem, but check the above derivative information quite carefully.}
In the case of "magic"
the items are:
TRUE
is multiple GCV/UBRE converged by meeting
convergence criteria. FALSE
if method stopped with a steepest descent step
failure.}
xp[i,]+covariate.shift[i]
are the locations for the ith smooth."mgcv"
code does not check for rank defficiency of the
model matrix that may result from lack of identifiability between the
parametric and smooth components of the model. You must have more unique combinations of covariates than the model has total parameters. (Total parameters is sum of basis dimensions plus sum of non-spline terms less the number of spline terms).
Automatic smoothing parameter selection is not likely to work well when fitting models to very few response data.
Relative scaling of covariates to a multi-dimensional smooth term has an affect on the results: make sure that relative scalings are sensible. For example, measuring one spatial co-ordinate in millimetres and the other in lightyears will usually produce poor results.
With large datasets (more than a few thousand data) the "tp"
basis gets very slow to use: use the knots
argument as discussed above and
shown in the examples. Alternatively, for 1-d smooths you can use the "cr"
basis.
For data with many zeroes clustered together in the covariate space it is quite easy to set up
GAMs which suffer from identifiability problems, particularly when using poisson or binomial
families. The problem is that with e.g. log or logit links, mean value zero corresponds to
an infinite range on the linear predictor scale. Some regularization is possible in such cases: see
gam.control
for details.
If absolutely any smooth functions were allowed in model fitting then maximum likelihood estimation of such models would invariably result in complex overfitting estimates of $f_1$ and $f_2$. For this reason the models are usually fit by penalized likelihood maximization, in which the model (negative log) likelihood is modified by the addition of a penalty for each smooth function, penalizing its `wiggliness'. To control the tradeoff between penalizing wiggliness and penalizing badness of fit each penalty is multiplied by an associated smoothing parameter: how to estimate these parameters, and how to practically represent the smooth functions are the main statistical questions introduced by moving from GLMs to GAMs.
The mgcv
implementation of gam
represents the smooth functions using
penalized regression splines, and by default uses basis functions for these splines that
are designed to be optimal, given the number basis functions used. The smooth terms can be
functions of any number of covariates and the user has some control over how smoothness of
the functions is measured.
gam
in mgcv
solves the smoothing parameter estimation problem by using the
Generalized Cross Validation (GCV) criterion or an Un-Biased Risk Estimator criterion
(UBRE) which is in practice an approximation to AIC. Smoothing parameters are chosen to
minimize the GCV or UBRE score for the model, and the main computational challenge solved
by the mgcv
package is to do this efficiently and reliably. Two alternative
numerical methods are provided, see mgcv
, magic
and
gam.control
.
Broadly gam
works by first constructing basis functions and a quadratic penalty
coefficient matrix for each smooth term in the model formula, obtaining a model matrix for
the strictly parametric part of the model formula, and combining these to obtain a
complete model matrix (/design matrix) and a set of penalty matrices for the smooth terms.
Some linear identifiability constraints are also obtained at this point. The model is
fit using gam.fit
, a modification of glm.fit
. The GAM
penalized likelihood maximization problem is solved by penalized Iteratively
Reweighted Least Squares (IRLS) (see e.g. Wood 2000). At each iteration a penalized
weighted least squares problem is solved, and the smoothing parameters of that problem are
estimated by GCV or UBRE. Eventually both model parameter estimates and smoothing
parameter estimates converge.
The fitting approach just described, in which the smoothing parameters are estimated for
each approximating linear model of the IRLS process was suggested by Chong Gu (see, e.g.
Gu 2002), and is very computationally efficient. However, because the approach neglects
the dependence of the iterative weights on the smoothing parameters, it is usually
possible to find smoothing parameters which actually yield slightly lower GCV/UBRE score
estimates than those resulting from this `performance iteration'. gam
therefore also
allows the user to `improve' the smoothing parameter estimates, by using O'Sullivan's
(1986) suggested method, in which for each trial set of smoothing parameters the IRLS is
iterated to convergance before the UBRE/GCV score is evaluated. This requires much less
efficient minimisation of the power iteration based on nlm
, and is
therefore quite slow.
Two alternative bases are available for representing model terms. Univariate smooth terms can be represented using conventional cubic regression splines - which are very efficient computationally - or thin plate regression splines. Multivariate terms must be represented using thin plate regression splines. For either basis the user specifies the dimension of the basis for each smooth term. The dimension of the basis is one more than the maximum degrees of freedom that the term can have, but usually the term will be fitted by penalized maximum likelihood estimation and the actual degrees of freedom will be chosen by GCV. However, the user can choose to fix the degrees of freedom of a term, in which case the actual degrees of freedom will be one less than the basis dimension.
Thin plate regression splines are constructed by starting with the
basis for a full thin plate spline and then truncating this basis in
an optimal manner, to obtain a low rank smoother. Details are given in
Wood (2003). One key advantage of the approach is that it avoids
the knot placement problems of conventional regression spline
modelling, but it also has the advantage that smooths of lower rank
are nested within smooths of higher rank, so that it is legitimate to
use conventional hypothesis testing methods to compare models based on
pure regression splines. The t.p.r.s. basis can become expensive to
calculate for large datasets. In this case the user can supply a reduced
set of knots to use in basis construction (see knots, in the argument list).
In the case of the cubic regression spline basis, knots of the spline are placed evenly
throughout the covariate values to which the term refers: For
example, if fitting 101 data with an 11 knot spline of x
then
there would be a knot at every 10th (ordered) x
value. The
parameterization used represents the spline in terms of its
values at the knots. The values at neighbouring knots
are connected by sections of cubic polynomial constrainted to be
continuous up to and including second derivative at the knots. The resulting curve
is a natural cubic spline through the values at the knots (given two extra conditions specifying
that the second derivative of the curve should be zero at the two end
knots). This parameterization gives the parameters a nice interpretability.
Details of "mgcv"
GCV/UBRE minimization method are given in Wood (2000):
the basis of the approach
is to alternate efficient global optimization with respect to one overall smoothing
parameter with Newton updates of a set of relative smoothing parameters for each smooth term. The Newton updates are
backed up by steepest descent, since the GCV/UBRE score functions are not positive definite everywhere.
Wood, S.N. (2000) Modelling and Smoothing Parameter Estimation with Multiple Quadratic Penalties. J.R.Statist.Soc.B 62(2):413-428
Wood, S.N. (2003) Thin plate regression splines. J.R.Statist.Soc.B 65(1):95-114
Background References:
Green and Silverman (1994) Nonparametric Regression and Generalized Linear Models. Chapman and Hall. Gu and Wahba (1991) Minimizing GCV/GML scores with multiple smoothing parameters via the Newton method. SIAM J. Sci. Statist. Comput. 12:383-398
Gu (2002) Smoothing Spline ANOVA Models, Springer.
Hastie and Tibshirani (1990) Generalized Additive Models. Chapman and Hall.
O'Sullivan, Yandall and Raynor (1986) Automatic smoothing of regression functions in generalized linear models. J. Am. Statist.Ass. 81:96-103
Wahba (1990) Spline Models of Observational Data. SIAM
Wood (2001) mgcv:GAMs and Generalized Ridge Regression for R. R News 1(2):20-25 Wood and Augustin (2002) GAMs with integrated model selection using penalized regression splines and applications to environmental modelling. Ecological Modelling 157:157-177
gam.models
, s
, predict.gam
,
plot.gam
, summary.gam
, gam.side.conditions
,
gam.selection
,mgcv
, gam.control
gam.check
, gam.neg.bin
, magic
,vis.gam
library(mgcv)
set.seed(0)
n<-400
sig2<-4
x0 <- runif(n, 0, 1)
x1 <- runif(n, 0, 1)
x2 <- runif(n, 0, 1)
x3 <- runif(n, 0, 1)
pi <- asin(1) * 2
f <- 2 * sin(pi * x0)
f <- f + exp(2 * x1) - 3.75887
f <- f+0.2*x2^11*(10*(1-x2))^6+10*(10*x2)^3*(1-x2)^10-1.396
e <- rnorm(n, 0, sqrt(abs(sig2)))
y <- f + e
b<-gam(y~s(x0)+s(x1)+s(x2)+s(x3))
summary(b)
plot(b,pages=1)
# an extra ridge penalty (useful with convergence problems) ....
bp<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),H=diag(0.5,41))
print(b);print(bp);rm(bp)
# set the smoothing parameter for the first term, estimate rest ...
bp<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),sp=c(0.01,-1,-1,-1))
plot(bp,pages=1);rm(bp)
# set lower bounds on smoothing parameters ....
bp<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),min.sp=c(0.001,0.01,0,10))
print(b);print(bp);rm(bp)
# now a GAM with 3df regression spline term & 2 penalized terms
b0<-gam(y~s(x0,k=4,fx=TRUE,bs="tp")+s(x1,k=12)+s(x2,15))
plot(b0,pages=1)
# now fit a 2-d term to x0,x1
b1<-gam(y~s(x0,x1)+s(x2)+s(x3))
par(mfrow=c(2,2))
plot(b1)
par(mfrow=c(1,1))
# now simulate poisson data
g<-exp(f/5)
y<-rpois(rep(1,n),g)
b2<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=poisson)
plot(b2,pages=1)
# and a pretty 2-d smoothing example....
test1<-function(x,z,sx=0.3,sz=0.4)
{ (pi**sx*sz)*(1.2*exp(-(x-0.2)^2/sx^2-(z-0.3)^2/sz^2)+
0.8*exp(-(x-0.7)^2/sx^2-(z-0.8)^2/sz^2))
}
n<-500
old.par<-par(mfrow=c(2,2))
x<-runif(n);z<-runif(n);
xs<-seq(0,1,length=30);zs<-seq(0,1,length=30)
pr<-data.frame(x=rep(xs,30),z=rep(zs,rep(30,30)))
truth<-matrix(test1(pr$x,pr$z),30,30)
contour(xs,zs,truth)
y<-test1(x,z)+rnorm(n)*0.1
b4<-gam(y~s(x,z))
fit1<-matrix(predict.gam(b4,pr,se=FALSE),30,30)
contour(xs,zs,fit1)
persp(xs,zs,truth)
vis.gam(b4)
par(old.par)
# very large dataset example using knots
n<-10000
x<-runif(n);z<-runif(n);
y<-test1(x,z)+rnorm(n)
ind<-sample(1:n,1000,replace=FALSE)
b5<-gam(y~s(x,z,k=50),knots=list(x=x[ind],z=z[ind]))
vis.gam(b5)
# and a pure "knot based" spline of the same data
b6<-gam(y~s(x,z,k=100),knots=list(x= rep((1:10-0.5)/10,10),
z=rep((1:10-0.5)/10,rep(10,10))))
vis.gam(b6,color="heat")
Run the code above in your browser using DataLab