Learn R Programming

grplasso (version 0.4-7)

grplasso: Function to Fit a Solution of a Group Lasso Problem

Description

Fits the solution of a group lasso problem for a model of type grpl.model.

Usage

grplasso(x, ...)

# S3 method for formula grplasso(formula, nonpen = ~ 1, data, weights, subset, na.action, lambda, coef.init, penscale = sqrt, model = LogReg(), center = TRUE, standardize = TRUE, control = grpl.control(), contrasts = NULL, ...)

# S3 method for default grplasso(x, y, index, weights = rep(1, length(y)), offset = rep(0, length(y)), lambda, coef.init = rep(0, ncol(x)), penscale = sqrt, model = LogReg(), center = TRUE, standardize = TRUE, control = grpl.control(), ...)

Arguments

x

design matrix (including intercept)

y

response vector

formula

formula of the penalized variables. The response has to be on the left hand side of ~.

nonpen

formula of the nonpenalized variables. This will be added to the formula argument above and doesn't need to have the response on the left hand side.

data

data.frame containing the variables in the model.

index

vector which defines the grouping of the variables. Components sharing the same number build a group. Non-penalized coefficients are marked with NA.

weights

vector of observation weights.

subset

an optional vector specifying a subset of observations to be used in the fitting process.

na.action

a function which indicates what should happen when the data contain 'NA's.

offset

vector of offset values; needs to have the same length as the response vector.

lambda

vector of penalty parameters. Optimization starts with the first component. See details below.

coef.init

initial vector of parameter estimates corresponding to the first component in the vector lambda.

penscale

rescaling function to adjust the value of the penalty parameter to the degrees of freedom of the parameter group. See the reference below.

model

an object of class grpl.model implementing the negative log-likelihood, gradient, hessian etc. See the documentation of grpl.model for more details.

center

logical. If true, the columns of the design matrix will be centered (except a possible intercept column).

standardize

logical. If true, the design matrix will be blockwise orthonormalized such that for each block \(X^TX = n 1\) (*after* possible centering).

control

options for the fitting algorithm, see grpl.control.

contrasts

an optional list. See the 'contrasts.arg' of 'model.matrix.default'.

...

additional arguments to be passed to the functions defined in model.

Value

A grplasso object is returned, for which coef, print, plot and predict methods exist.

coefficients

coefficients with respect to the original input variables (even if standardize = TRUE is used for fitting).

lambda

vector of lambda values where coefficients were calculated.

index

grouping index vector.

Details

When using grplasso.formula, the grouping of the variables is derived from the type of the variables: The dummy variables of a factor will be automatically treated as a group.

The optimization process starts using the first component of lambda as penalty parameter \(\lambda\) and with starting values defined in coef.init for the parameter vector. Once fitted, the next component of lambda is considered as penalty parameter with starting values defined as the (fitted) coefficient vector based on the previous component of lambda.

References

Lukas Meier, Sara van de Geer and Peter B\"uhlmann (2008), The Group Lasso for Logistic Regression, Journal of the Royal Statistical Society, 70 (1), 53 - 71

Examples

Run this code
# NOT RUN {
## Use the Logistic Group Lasso on the splice data set
data(splice)

## Define a list with the contrasts of the factors
contr <- rep(list("contr.sum"), ncol(splice) - 1)
names(contr) <- names(splice)[-1]

## Fit a logistic model 
fit.splice <- grplasso(y ~ ., data = splice, model = LogReg(), lambda = 20,
                       contrasts = contr, center = TRUE, standardize = TRUE)

## Perform the Logistic Group Lasso on a random dataset
set.seed(79)

n <- 50  ## observations
p <- 4   ## variables

## First variable (intercept) not penalized, two groups having 2 degrees
## of freedom each

index <- c(NA, 2, 2, 3, 3)

## Create a random design matrix, including the intercept (first column)
x <- cbind(1, matrix(rnorm(p * n), nrow = n))
colnames(x) <- c("Intercept", paste("X", 1:4, sep = ""))

par <- c(0, 2.1, -1.8, 0, 0)
prob <- 1 / (1 + exp(-x %*% par))
mean(pmin(prob, 1 - prob)) ## Bayes risk
y <- rbinom(n, size = 1, prob = prob) ## binary response vector

## Use a multiplicative grid for the penalty parameter lambda, starting
## at the maximal lambda value
lambda <- lambdamax(x, y = y, index = index, penscale = sqrt,
                    model = LogReg()) * 0.5^(0:5)

## Fit the solution path on the lambda grid
fit <- grplasso(x, y = y, index = index, lambda = lambda, model = LogReg(),
                penscale = sqrt,
                control = grpl.control(update.hess = "lambda", trace = 0))

## Plot coefficient paths
plot(fit)
# }

Run the code above in your browser using DataLab