Fits and cross-validates a regularized generalized linear model via penalized maximum likelihood. The model is fit for a path of values of the penalty parameter, and a parameter value is chosen by cross-validation. Fits linear, logistic and Cox models.
cvSGL(data, index = rep(1, ncol(data$x)), type = "linear", maxit = 1000, thresh = 0.001,
min.frac = 0.05, nlam = 20, gamma = 0.8, nfold = 10, standardize = TRUE,
verbose = FALSE, step = 1, reset = 10, alpha = 0.95, lambdas = NULL,
foldid = NULL)
For type="linear"
should be a list with $x$ an input matrix of dimension n-obs by p-vars, and $y$ a length $n$ response vector. For type="logit"
should be a list with $x$, an input matrix, as before, and $y$ a length $n$ binary response vector. For type="cox"
should be a list with x as before, time
, an n-vector corresponding to failure/censor times, and status
, an n-vector indicating failure (1) or censoring (0).
A p-vector indicating group membership of each covariate
model type: one of ("linear","logit", "cox")
Maximum number of iterations to convergence
Convergence threshold for change in beta
The minimum value of the penalty parameter, as a fraction of the maximum value
Number of lambda to use in the regularization path
Fitting parameter used for tuning backtracking (between 0 and 1)
Number of folds of the cross-validation loop
Logical flag for variable standardization (scaling) prior to fitting the model.
Logical flag for whether or not step number will be output
Fitting parameter used for inital backtracking step size (between 0 and 1)
Fitting parameter used for taking advantage of local strong convexity in nesterov momentum (number of iterations before momentum term is reset)
The mixing parameter. alpha
= 1 is the lasso penalty.
A user inputted sequence of lambda values for fitting. We recommend leaving this NULL and letting SGL self-select values
An optional user-pecified vector indicating the cross-validation fold in which each observation should be included. Values in this vector should range from 1 to nfold. If left unspecified, SGL will randomly assign observations to folds
An object with S3 class "cv.SGL"
An nlam
vector of cross validated negative log likelihoods (squared error loss in the linear
case, along the regularization path)
An nlame
vector of approximate standard deviations of lldiff
The actual list of lambda
values used in the regularization path.
Response type (linear/logic/cox)
A model fit object created by a call to SGL
on the entire dataset
A vector indicating the cross-validation folds that each observation is assigned to
A matrix of prevalidated predictions for each observation, for each lambda-value
The function runs SGL
nfold
+1 times; the initial run is to find the lambda
sequence, subsequent runs are used to compute the cross-validated error rate and its standard deviation.
Simon, N., Friedman, J., Hastie, T., and Tibshirani, R. (2011) A Sparse-Group Lasso, http://faculty.washington.edu/nrsimon/SGLpaper.pdf
SGL
# NOT RUN {
set.seed(1)
n = 50; p = 100; size.groups = 10
index <- ceiling(1:p / size.groups)
X = matrix(rnorm(n * p), ncol = p, nrow = n)
beta = (-2:2)
y = X[,1:5] %*% beta + 0.1*rnorm(n)
data = list(x = X, y = y)
cvFit = cvSGL(data, index, type = "linear")
# }
Run the code above in your browser using DataLab