gam
is used to fit generalized additive models, specified by giving a
symbolic description of the additive predictor and a description of the
error distribution. gam
uses the backfitting algorithm to
combine different smoothing or fitting methods. The methods currently
supported are local regression and smoothing splines.
gam(
formula,
family = gaussian,
data,
weights,
subset,
na.action,
start = NULL,
etastart,
mustart,
control = gam.control(...),
model = TRUE,
method = "glm.fit",
x = FALSE,
y = TRUE,
...
)gam.fit(
x,
y,
smooth.frame,
weights = rep(1, nobs),
start = NULL,
etastart = NULL,
mustart = NULL,
offset = rep(0, nobs),
family = gaussian(),
control = gam.control()
)
gam
returns an object of class Gam
, which inherits
from both glm
and lm
.
Gam objects can be examined by print
, summary
, plot
,
and anova
. Components can be extracted using extractor functions
predict
, fitted
, residuals
, deviance
,
formula
, and family
. Can be modified using update
. It
has all the components of a glm
object, with a few more. This also
means it can be queried, summarized etc by methods for glm
and
lm
objects. Other generic functions that have methods for Gam
objects are step
and preplot
.
The following components must be included in a legitimate `Gam' object. The
residuals, fitted values, coefficients and effects should be extracted by
the generic functions of the same name, rather than by the "$"
operator. The family
function returns the entire family object used
in the fitting, and deviance
can be used to extract the deviance of
the fit.
the coefficients of the parametric part of the
additive.predictors
, which multiply the columns of the model matrix.
The names of the coefficients are the names of the single-degree-of-freedom
effects (the columns of the model matrix). If the model is overdetermined
there will be missing values in the coefficients corresponding to
inestimable coefficients.
the additive fit,
given by the product of the model matrix and the coefficients, plus the
columns of the $smooth
component.
the fitted
mean values, obtained by transforming the component
additive.predictors
using the inverse link function.
these four characterize the nonparametric aspect of
the fit. smooth
is a matrix of smooth terms, with a column
corresponding to each smooth term in the model; if no smooth terms are in
the Gam
model, all these components will be missing. Each column
corresponds to the strictly nonparametric part of the term, while the
parametric part is obtained from the model matrix. nl.df
is a vector
giving the approximate degrees of freedom for each column of smooth
.
For smoothing splines specified by s(x)
, the approximate df
will be the trace of the implicit smoother matrix minus 2. nl.chisq
is a vector containing a type of score test for the removal of each of the
columns of smooth
. var
is a matrix like smooth
,
containing the approximate pointwise variances for the columns of
smooth
.
This is essentially a subset of the
model frame corresponding to the smooth terms, and has the ingredients
needed for making predictions from a Gam
object
the residuals from the final weighted additive fit; also known as residuals, these are typically not interpretable without rescaling by the weights.
up to a constant, minus twice the maximized log-likelihood. Similar to the residual sum of squares. Where sensible, the constant is chosen so that a saturated model has deviance zero.
The deviance for the null model, comparable with
deviance
. The null model will include the offset, and an intercept if
there is one in the model
the number of local scoring iterations used to compute the estimates.
a vector of
length iter
giving number of backfitting iterations used at each
inner loop.
a three-element character vector giving the name of the family, the link, and the variance function; mainly for printing purposes.
the working weights, that is the weights in the final iteration of the local scoring fit.
the case weights initially supplied.
the residual degrees of freedom.
the residual degrees of freedom for the null model.
The object will also have the components of a lm
object:
coefficients
, residuals
, fitted.values
, call
,
terms
, and some others involving the numerical fit. See
lm.object
.
a formula expression as for other regression models, of the
form response ~ predictors
. See the documentation of lm
and
formula
for details. Built-in nonparametric smoothing terms are
indicated by s
for smoothing splines or lo
for loess
smooth terms. See the documentation for s
and lo
for their
arguments. Additional smoothers can be added by creating the appropriate
interface functions. Interactions with nonparametric smooth terms are not
fully supported, but will not produce errors; they will simply produce the
usual parametric interaction.
a description of the error distribution and link function to
be used in the model. This can be a character string naming a family
function, a family function or the result of a call to a family function.
(See family
for details of family functions.)
an optional data frame containing the variables in the model.
If not found in data
, the variables are taken from
environment(formula)
, typically the environment from which gam
is called.
an optional vector of weights to be used in the fitting process.
an optional vector specifying a subset of observations to be used in the fitting process.
a function which indicates what should happen when the data
contain NA
s. The default is set by the na.action
setting of
options
, and is na.fail
if that is unset. The
“factory-fresh” default is na.omit
. A special method
na.gam.replace
allows for mean-imputation of missing values
(assumes missing at random), and works gracefully with gam
starting values for the parameters in the additive predictor.
starting values for the additive predictor.
starting values for the vector of means.
a list of parameters for controlling the fitting process.
See the documentation for gam.control
for details. These can
also be set as arguments to gam()
itself.
a logical value indicating whether model frame should be
included as a component of the returned value. Needed if gam
is
called and predicted from inside a user function. Default is TRUE
.
the method to be used in fitting the parametric part of the
model. The default method "glm.fit"
uses iteratively reweighted
least squares (IWLS). The only current alternative is "model.frame"
which returns the model frame and does no fitting.
For gam
: logical values indicating whether the response
vector and model matrix used in the fitting process should be returned as
components of the returned value.
For gam.fit
: x
is a model matrix of dimension n * p
,
and y
is a vector of observations of length n
.
further arguments passed to or from other methods.
for gam.fit
only. This is essentially a subset of
the model frame corresponding to the smooth terms, and has the ingredients
needed for smoothing each variable in the backfitting algorithm. The
elements of this frame are produced by the formula functions lo
and
s
.
this can be used to specify an a priori known component to be included in the additive predictor during fitting.
Written by Trevor Hastie, following closely the design in the
"Generalized Additive Models" chapter (Hastie, 1992) in Chambers and Hastie
(1992), and the philosophy in Hastie and Tibshirani (1991). This version of
gam
is adapted from the S version to match the glm
and
lm
functions in R.
Note that this version of gam
is different from the function with the
same name in the R library mgcv
, which uses only smoothing splines
with a focus on automatic smoothing parameter selection via GCV. To avoid
issues with S3 method handling when both packages are loaded, the object
class in package "gam" is now "Gam".
The gam model is fit using the local scoring algorithm, which iteratively
fits weighted additive models by backfitting. The backfitting algorithm is a
Gauss-Seidel method for fitting additive models, by iteratively smoothing
partial residuals. The algorithm separates the parametric from the
nonparametric part of the fit, and fits the parametric part using weighted
linear least squares within the backfitting algorithm. This version of
gam
remains faithful to the philosophy of GAM models as outlined in
the references below.
An object gam.slist
(currently set to c("lo","s","random")
)
lists the smoothers supported by gam
. Corresponding to each of these
is a smoothing function gam.lo
, gam.s
etc that take particular
arguments and produce particular output, custom built to serve as building
blocks in the backfitting algorithm. This allows users to add their own
smoothing methods. See the documentation for these methods for further
information. In addition, the object gam.wlist
(currently set to
c("s","lo")
) lists the smoothers for which efficient backfitters are
provided. These are invoked if all the smoothing methods are of one kind
(either all "lo"
or all "s"
).
Hastie, T. J. (1991) Generalized additive models. Chapter 7 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.
Hastie, T. and Tibshirani, R. (1990) Generalized Additive Models. London: Chapman and Hall.
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. New York: Springer.
data(kyphosis)
gam(Kyphosis ~ s(Age,4) + Number, family = binomial, data=kyphosis,
trace=TRUE)
data(airquality)
gam(Ozone^(1/3) ~ lo(Solar.R) + lo(Wind, Temp), data=airquality, na=na.gam.replace)
gam(Kyphosis ~ poly(Age,2) + s(Start), data=kyphosis, family=binomial, subset=Number>2)
data(gam.data)
Gam.object <- gam(y ~ s(x,6) + z,data=gam.data)
summary(Gam.object)
plot(Gam.object,se=TRUE)
data(gam.newdata)
predict(Gam.object,type="terms",newdata=gam.newdata)
Run the code above in your browser using DataLab