gam
object produced by gam()
and produces various useful
summaries from it. (See sink
to divert output to a file.)## S3 method for class 'gam':
summary(object, dispersion=NULL, freq=FALSE, alpha=0, ...)## S3 method for class 'summary.gam':
print(x,digits = max(3, getOption("digits") - 3),
signif.stars = getOption("show.signif.stars"),...)
gam
object as produced by gam()
.summary.gam
object produced by summary.gam()
.NULL
to use estimate or
default (e.g. 1 for Poisson).summary.gam
produces a list of summary information for a fitted gam
object.p.coeff
's divided by their standard errors.freq=TRUE
), divided
by scale parameter.freq=TRUE
).print.summary.gam
tries to print various bits of summary information useful for term selection in a pretty way.
If freq=TRUE
then the frequentist approximation for p-values of smooth terms described in section
4.8.5 of Wood (2006) is used. The approximation is not great. If ${\bf p}_i$
is the parameter vector for the ith smooth term, and this term has estimated
covariance matrix ${\bf V}_i$ then the
statistic is ${\bf p}_i^\prime {\bf V}_i^{k-} {\bf
p}_i$, where ${\bf V}^{k-}_i$ is the rank k
pseudo-inverse of ${\bf V_i}$, and k is estimated rank of
${\bf V_i}$. p-values are obtained as follows. In the case of
known dispersion parameter, they are obtained by comparing the chi.sq statistic to the
chi-squared distribution with k degrees of freedom, where k is the estimated
rank of ${\bf V_i}$. If the dispersion parameter is unknown (in
which case it will have been estimated) the statistic is compared
to an F distribution with k upper d.f. and lower d.f. given by the residual degrees of freedom for the model.
Typically the p-values will be somewhat too low.
If freq=FALSE
then `Bayesian p-values' are returned for the smooth terms, based on a test statistic motivated
by Nychka's (1988) analysis of the frequentist properties of Bayesian confidence intervals for smooths.
These appear to have better frequentist performance (in terms of power and distribution under the null)
than the alternative strictly frequentist approximation. Let $\bf f$ denote the vector of
values of a smooth term evaluated at the original
covariate values and let ${\bf V}_f$ denote the corresponding Bayesian covariance matrix. Let
${\bf V}_f^{r-}$ denote the rank $r$ pseudoinverse of ${\bf V}_f$, where $r$ is the
EDF for the term. The statistic used is then
$$T = {\bf f}^T {\bf V}_f^{r-}{\bf f}$$
(this can be calculated efficiently without forming the pseudoinverse explicitly). $T$ is compared to a
chi-squared distribution with degrees of freedom given by the EDF for the term + alpha
* (number of
estimated smoothing parameters), or $T$ is used as a component in an F ratio statistic if the
scale parameter has been estimated. The non-integer rank truncated inverse is constructed to give an
approximation varying smoothly between the bounding integer rank approximations, while yielding test statistics with the
correct mean and variance.
The definition of Bayesian p-value used is: the probability of observing an $\bf f$ less probable than $\bf 0$, under the approximation for the posterior for $\bf f$ implied by the truncation used in the test statistic.
Note that the p-values distributional approximations start to break down below one effective degree of freedom, and p-values are not reported below 0.5 degrees of freedom.
In simualtions the p-values have best behaviour under ML smoothness selection, with REML coming second.
Wood S.N. (2006) Generalized Additive Models: An Introduction with R. Chapman and Hall/CRC Press.
gam
, predict.gam
,
gam.check
, anova.gam
library(mgcv)
set.seed(0)
dat <- gamSim(1,n=200,scale=2) ## simulate data
b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat)
plot(b,pages=1)
summary(b)
## now check the p-values by using a pure regression spline.....
b.d <- round(summary(b)$edf)+1 ## get edf per smooth
b.d <- pmax(b.d,3) # can't have basis dimension less than 3!
bc<-gam(y~s(x0,k=b.d[1],fx=TRUE)+s(x1,k=b.d[2],fx=TRUE)+
s(x2,k=b.d[3],fx=TRUE)+s(x3,k=b.d[4],fx=TRUE),data=dat)
plot(bc,pages=1)
summary(bc)
## p-value check - increase k to make this useful!
k<-20;n <- 200;p <- rep(NA,k)
for (i in 1:k)
{ b<-gam(y~te(x,z),data=data.frame(y=rnorm(n),x=runif(n),z=runif(n)),
method="ML")
p[i]<-summary(b)$s.p[1]
}
plot(((1:k)-0.5)/k,sort(p))
abline(0,1,col=2)
ks.test(p,"punif") ## how close to uniform are the p-values?
Run the code above in your browser using DataLab