mgcv: Multiple Smoothing Parameter Estimation by GCV or UBRE

Description

Function to efficiently estimate smoothing parameters in Generalized Ridge Regression Problem with multiple (quadratic) penalties, by GCV or UBRE. The function uses Newton's method in multi-dimensions, backed up by steepest descent to iteratively adjust a set of relative smoothing parameters for each penalty. To ensure that the overall level of smoothing is optimal, and to guard against trapping by local minima, a highly efficient global minimisation with respect to one overall smoothing parameter is also made at each iteration.

The other functions in the mgcv package are listed under `See Also'.

Usage

mgcv(M)

Arguments

is the single argument to mgcv and should be a list with the elements listed below:

M\$y

{The response data vector.}

M$w{A vector of weights for the data (often proportional to the reciprocal of

Value

The function returns a list of the same form as the input list with the following elements added/modified:
M$pThe best fit parameters given the estimated smoothing parameters.
M$spThe estimated smoothing parameters ($\lambda_i/\rho$)
M$sig2The estimated (GCV) or supplied (UBRE) error variance.
M$edfThe estimated degrees of freedom associated with each penalized term. If ${\bf p}_i$ is the vector of parameters penalized by the ith penalty then we can write ${\bf p}_i = {\bf P}_i {\bf y}$, and the edf for the ith term is $tr({\bf XP}_i)$... while this is sensible in the non-overlapping penalty case (e.g. GAMs) it makes alot less sense if penalties overlap!
M$VpEstimated covariance matrix of model parameters.
M$gcv.ubreThe minimized GCV/UBRE score.
M$convA list of convergence diagnostics, with the following elements: edf{Array of whole model estimated degrees of freedom.} hat{Array of elements from leading diagonal of `hat' matrix (`influence' matrix). Same length as M$y.} score{Array of ubre/gcv scores at the edfs for the final set of relatinve smoothing parameters.} g{the gradient of the GCV/UBRE score w.r.t. the smoothing parameters at termination.} h{the second derivatives corresponding to g above - i.e. the leading diagonal of the Hessian.} e{the eigen-values of the Hessian. These should all be non-negative!} iter{the number of iterations taken.} in.ok{TRUE if the second smoothing parameter guess improved the GCV/UBRE score. (Please report examples where this is FALSE)} step.fail{TRUE if the algorithm terminated by failing to improve the GCV/UBRE score rather than by "converging". Not necessarily a problem, but check the above derivative information quite carefully.}

WARNING

The method may not behave well with near column rank deficient ${\bf X}$ especially in contexts where the weights vary wildly.

Details

This is documentation for the code implementing the method described in section 4 of Wood (2000) . The method is a computationally efficient means of applying GCV to the problem of smoothing parameter selection in generalized ridge regression problems of the form: $$minimise~ \| { \bf W} ({ \bf Xp - y} ) \|^2 \rho + \sum_{i=1}^m \lambda_i {\bf p^\prime S}_i{\bf p}$$ possibly subject to constraints ${\bf Cp}={\bf 0}$. ${\bf X}$ is a design matrix, $\bf p$ a parameter vector, $\bf y$ a data vector, $\bf W$ a diagonal weight matrix, ${\bf S}_i$ a positive semi-definite matrix of coefficients defining the ith penalty and $\bf C$ a matrix of coefficients defining any linear equality constraints on the problem. The smoothing parameters are the $\lambda_i$ but there is an overall smoothing parameter $\rho$ as well. Note that ${\bf X}$ must be of full column rank, at least when projected into the null space of any equality constraints.

The method operates by alternating very efficient direct searches for $\rho$ with Newton or steepest descent updates of the logs of the $\lambda_i$. Because the GCV/UBRE scores are flat w.r.t. very large or very small $\lambda_i$, it's important to get good starting parameters, and to be careful not to step into a flat region of the smoothing parameter space. For this reason the algorithm rescales any Newton step that would result in a $log(\lambda_i)$ change of more than 5. Newton steps are only used if the Hessian of the GCV/UBRE is postive definite, otherwise steepest descent is used. Similarly steepest descent is used if the Newton step has to be contracted too far (indicating that the quadratic model underlying Newton is poor). All initial steepest descent steps are scaled so that their largest component is 1. However a step is calculated, it is never expanded if it is successful (to avoid flat portions of the objective), but steps are successively halved if they do not decrease the GCV/UBRE score, until they do, or the direction is deemed to have failed. M$conv provides some convergence diagnostics.

The method is coded in C and is intended to be portable. It should be noted that seriously ill conditioned problems (i.e. with close to column rank deficiency in the design matrix) may cause problems, especially if weights vary wildly between observations.

References

Gu and Wahba (1991) Minimizing GCV/GML scores with multiple smoothing parameters via the Newton method. SIAM J. Sci. Statist. Comput. 12:383-398

Wood, S.N. (2000) Modelling and Smoothing Parameter Estimation with Multiple Quadratic Penalties. J.R.Statist.Soc.B 62(2):413-428

Wood, S.N. (2003) Thin plate regression splines. J.R.Statist.Soc.B 65(1):95-114

http://www.stats.gla.ac.uk/~simon/

Examples

Run this code

set.seed(0)    
n<-100 # number of observations to simulate
x <- runif(4 * n, 0, 1) # simulate covariates
x <- array(x, dim = c(4, n)) # put into array for passing to GAMsetup
pi <- asin(1) * 2  # begin simulating some data
y <- 2 * sin(pi * x[1, ])
y <- y + exp(2 * x[2, ]) - 3.75887
y <- y + 0.2 * x[3, ]^11 * (10 * (1 - x[3, ]))^6 + 10 * (10 *
        x[3, ])^3 * (1 - x[3, ])^10 - 1.396
sig2<- -1    # set magnitude of variance
e <- rnorm(n, 0, sqrt(abs(sig2)))
y <- y + e          # simulated data
w <- matrix(1, n, 1) # weight matrix
par(mfrow = c(2, 2)) # scatter plots of simulated data   
plot(x[1, ], y);plot(x[2, ], y)
plot(x[3, ], y);plot(x[4, ], y)
# create list for passing to GAMsetup
G <- list(m = 4, n = n, nsdf = 0, df = c(15, 15, 15, 15),dim=c(1,1,1,1),
          s.type=c(0,0,0,0),by=0,by.exists=c(FALSE,FALSE,FALSE,FALSE),
          p.order=c(0,0,0,0), x = x,n.knots=rep(0,4),fit.method="mgcv") 
H <- GAMsetup(G)
H$y <- y    # add data to H
H$sig2 <- sig2  # add variance (signalling GCV use in this case) to H
H$w <- w       # add weights to H
H$sp<-array(-1,H$m)
H$fix<-array(FALSE,H$m)
H$conv.tol<-1e-6;H$max.half<-15
H$min.edf<-5;H$fixed.sp<-0
H <- mgcv(H)  # select smoothing parameters and fit model

Run the code above in your browser using DataLab