gamsel: Fit Regularization Path for Gaussian or Binomial Generalized Additive Model

Description

Using overlap grouped lasso penalties, gamsel selects whether a term in a gam is nonzero, linear, or a non-linear spline (up to a specified max df per variable). It fits the entire regularization path on a grid of values for the overall penalty lambda, both for gaussian and binomial families.

Usage

gamsel(
  x,
  y,
  num_lambda = 50,
  lambda = NULL,
  family = c("gaussian", "binomial"),
  degrees = rep(10, p),
  gamma = 0.4,
  dfs = rep(5, p),
  bases = pseudo.bases(x, degrees, dfs, parallel = parallel, ...),
  tol = 1e-04,
  max_iter = 2000,
  traceit = FALSE,
  parallel = FALSE,
  ...
)

Value

An object with S3 class gamsel. %% If it is a LIST, use

intercept: Intercept sequence of length num_lambda
alphas: nvars x num_lambda matrix of linear coefficient estimates
betas: sum(degrees) x num_lambda matrix of non-linear coefficient estimates
lambdas: The sequence of lambda values used
degrees: Number of basis functions used for each variable
parms: A set of parameters that capture the bases used. This allows for efficient generation of the bases elements for predict.gamsel

, the predict method for this class.

family: "gaussian" or "binomial"
nulldev: Null deviance (deviance of the intercept model)
dev.ratio: Vector of length num_lambda giving fraction of (null) deviance explained by each model along the lambda sequence
call: The call that produced this object

%% ...

Arguments

x: Input (predictor) matrix of dimension nobs x nvars. Each observation is a row.
y: Response variable. Quantitative for family="gaussian" and with values in {0,1} for family="binomial"
num_lambda: Number of lambda values to use. (Length of lambda sequence.)
lambda: User-supplied lambda sequence. For best performance, leave as NULL and allow the routine to automatically select lambda. Otherwise, supply a (preferably gradually) decreasing sequence.
family: Response type. "gaussian" for linear model (default). "binomial" for logistic model.
degrees: An integer vector of length nvars specifying the maximum number of spline basis functions to use for each variable.
gamma: Penalty mixing parameter \(0 \le\gamma\le 1\). Values \( \gamma < 0.5\) penalize linear fit less than non-linear fit. The default is \(\gamma = 0.4\), which encourages a linear term over a nonlinear term.
dfs: Numeric vector of length nvars specifying the maximum (end-of-path) degrees of freedom for each variable.
bases: A list of orthonormal bases for the non-linear terms for each variable. The function pseudo.bases generates these, using the parameters dfs and degrees. See the documentation for pseudo.bases.
tol: Convergence threshold for coordinate descent. The coordinate descent loop continues until the total change in objective after a pass over all variables is less than tol. Default is 1e-4.
max_iter: Maximum number of coordinate descent iterations over all the variables for each lambda value. Default is 2000.
traceit: If TRUE, various information is printed during the fitting process.
parallel: passed on to the pseudo.bases() function. Uses multiple process if available.
...: additional arguments passed on to pseudo.bases()

Author

Alexandra Chouldechova and Trevor Hastie
Maintainer: Trevor Hastie hastie@stanford.edu

Details

The sequence of models along the lambda path is fit by (block) cordinate descent. In the case of logistic regression the fitting routine may terminate before all num_lambda values of lambda have been used. This occurs when the fraction of null deviance explained by the model gets too close to 1, at which point the fit becomes numerically unstable. Each of the smooth terms is computed using an approximation to the Demmler-Reinsch smoothing spline basis for that variable, and the accompanying diagonal pernalty matrix.

References

Chouldechova, A. and Hastie, T. (2015) Generalized Additive Model Selection, https://arxiv.org/abs/1506.03850

Examples

Run this code


##data=gamsel:::gendata(n=500,p=12,k.lin=3,k.nonlin=3,deg=8,sigma=0.5)
data = readRDS(system.file("extdata/gamsel_example.RDS", package = "gamsel"))
attach(data)
bases=pseudo.bases(X,degree=10,df=6)
# Gaussian gam
gamsel.out=gamsel(X,y,bases=bases)
par(mfrow=c(1,2),mar=c(5,4,3,1))
summary(gamsel.out)
gamsel.cv=cv.gamsel(X,y,bases=bases)
par(mfrow=c(1,1))
plot(gamsel.cv)
par(mfrow=c(3,4))
plot(gamsel.out,newx=X,index=20)
# Binomial model
gamsel.out=gamsel(X,yb,family="binomial")
par(mfrow=c(1,2),mar=c(5,4,3,1))
summary(gamsel.out)
par(mfrow=c(3,4))
plot(gamsel.out,newx=X,index=30)

Run the code above in your browser using DataLab