cvSGL: Fit and Cross-Validate a GLM with a Combination of Lasso and Group Lasso Regularization

Description

Fits and cross-validates a regularized generalized linear model via penalized maximum likelihood. The model is fit for a path of values of the penalty parameter, and a parameter value is chosen by cross-validation. Fits linear, logistic and Cox models.

Usage

cvSGL(data, index = rep(1, ncol(data$x)), type = "linear", maxit = 1000, thresh = 0.001,
min.frac = 0.05, nlam = 20, gamma = 0.8, nfold = 10, standardize = TRUE,
verbose = FALSE, step = 1, reset = 10, alpha = 0.95, lambdas = NULL,
foldid = NULL)

Arguments

data

For type="linear" should be a list with $x$ an input matrix of dimension n-obs by p-vars, and $y$ a length $n$ response vector. For type="logit" should be a list with $x$, an input matrix, as before, and $y$ a length $n$ binary response vector. For type="cox" should be a list with x as before, time, an n-vector corresponding to failure/censor times, and status, an n-vector indicating failure (1) or censoring (0).

index

A p-vector indicating group membership of each covariate

type

model type: one of ("linear","logit", "cox")

maxit

Maximum number of iterations to convergence

thresh

Convergence threshold for change in beta

min.frac

The minimum value of the penalty parameter, as a fraction of the maximum value

nlam

Number of lambda to use in the regularization path

gamma

Fitting parameter used for tuning backtracking (between 0 and 1)

nfold

Number of folds of the cross-validation loop

standardize

Logical flag for variable standardization (scaling) prior to fitting the model.

verbose

Logical flag for whether or not step number will be output

step

Fitting parameter used for inital backtracking step size (between 0 and 1)

reset

Fitting parameter used for taking advantage of local strong convexity in nesterov momentum (number of iterations before momentum term is reset)

alpha

The mixing parameter. alpha = 1 is the lasso penalty.

lambdas

A user inputted sequence of lambda values for fitting. We recommend leaving this NULL and letting SGL self-select values

foldid

An optional user-pecified vector indicating the cross-validation fold in which each observation should be included. Values in this vector should range from 1 to nfold. If left unspecified, SGL will randomly assign observations to folds

Value

An object with S3 class "cv.SGL"

lldiff

An nlam vector of cross validated negative log likelihoods (squared error loss in the linear case, along the regularization path)

llSD

An nlame vector of approximate standard deviations of lldiff

lambdas

The actual list of lambda values used in the regularization path.

type

Response type (linear/logic/cox)

fit

A model fit object created by a call to SGL on the entire dataset

foldid

A vector indicating the cross-validation folds that each observation is assigned to

prevals

A matrix of prevalidated predictions for each observation, for each lambda-value

Details

The function runs SGL nfold+1 times; the initial run is to find the lambda sequence, subsequent runs are used to compute the cross-validated error rate and its standard deviation.

References

Simon, N., Friedman, J., Hastie, T., and Tibshirani, R. (2011) A Sparse-Group Lasso, http://faculty.washington.edu/nrsimon/SGLpaper.pdf

Examples

Run this code

# NOT RUN {
set.seed(1)
n = 50; p = 100; size.groups = 10
index <- ceiling(1:p / size.groups)
X = matrix(rnorm(n * p), ncol = p, nrow = n)
beta = (-2:2)
y = X[,1:5] %*% beta + 0.1*rnorm(n)
data = list(x = X, y = y)
cvFit = cvSGL(data, index, type = "linear")
# }

Run the code above in your browser using DataLab