bolasso: Bolasso: Bootstrapped Lasso

Description

Perform a bootstrapped Lasso on some random subsamplings of the input data

Usage

bolasso(data,Y,mu,m,probaseuil,penalty.factor,random)

Arguments

data

Input matrix of dimension n * p; each of the n rows is an observation vector of p variables. The intercept should be included in the first column as (1,...,1). If not, it is added.

Response variable of length n.

Positive regularization sequence to be used for the Lasso.

Number of bootstrap iteration of the Lasso. Default is m=100.

probaseuil

A frequency threshold for selecting the most stable variables over the m boostrap iteration of the Lasso. Default is 1.

penalty.factor

Separate penalty factors can be applied to each coefficient. This is a number that multiplies lambda to allow differential shrinkage. Can be 0 for some variables, which implies no shrinkage, and that variable is always included in the model. Default is 1 for all variables except the intercept.

random

optionnal parameter, matrix of size n*m. If random is provided, the m bootstrap samples are constructed from its m columns.

Value

data

plotA list containing:

Y - the input response vector
means.X - Vector of means of the input data matrix.
sigma.X - Vector of variances of the input data matrix.

ind

Set of selected variables for the regularization mu and the threshold probaseuil.

frequency

Appearance frequency of each variable; number of times each variables is selected over the m bootstrap iterations.

Details

The Lasso from the glmnet package is performed with the regularization parameter mu over m bootstrap samples. An appearance frequency is obtained which shows the predictive power of each variable. It is calculated as the number of times a variables has been selected by the Lasso over the m bootstrap iteration.

References

Model-consistent sparse estimation through the bootstrap; F. Bach 2009

Examples

Run this code

## Not run: 
# x=matrix(rnorm(100*20),100,20)
# beta=c(rep(1,5),rep(0,15))
# y=x%*%beta+rnorm(100)
# 
# mod=bolasso(x,y,mu=seq(1.5,0.1,-0.1))
# mod
# ## End(Not run)

Run the code above in your browser using DataLab