randomizedLasso: Inference for the randomized lasso, with a fixed lambda

Description

Solve a randomly perturbed LASSO problem.

Usage

randomizedLasso(X, 
                y, 
                lam, 
                family=c("gaussian", "binomial"),
                noise_scale=NULL, 
                ridge_term=NULL, 
                max_iter=100,       
                kkt_tol=1.e-4,      
                parameter_tol=1.e-8,
                objective_tol=1.e-8,
                objective_stop=FALSE,
                kkt_stop=TRUE,
                parameter_stop=TRUE)

Arguments

Matrix of predictors (n by p);

Vector of outcomes (length n)

lam

Value of lambda used to compute beta. See the above warning Be careful! This function uses the "standard" lasso objective $$ 1/2 \|y - x \beta\|_2^2 + \lambda \|\beta\|_1. $$ In contrast, glmnet multiplies the first term by a factor of 1/n. So after running glmnet, to extract the beta corresponding to a value lambda, you need to use beta = coef(obj, s=lambda/n)[-1], where obj is the object returned by glmnet (and [-1] removes the intercept, which glmnet always puts in the first component)

family

Response type: "gaussian" (default), "binomial".

noise_scale

Scale of Gaussian noise added to objective. Default is 0.5 * sd(y) times the sqrt of the mean of the trace of X^TX.

ridge_term

A small "elastic net" or ridge penalty is added to ensure the randomized problem has a solution. 0.5 * sd(y) times the sqrt of the mean of the trace of X^TX divided by sqrt(n).

max_iter

How many rounds of updates used of coordinate descent in solving randomized LASSO.

kkt_tol

Tolerance for checking convergence based on KKT conditions.

parameter_tol

Tolerance for checking convergence based on convergence of parameters.

objective_tol

Tolerance for checking convergence based on convergence of objective value.

kkt_stop

Should we use KKT check to determine when to stop?

parameter_stop

Should we use convergence of parameters to determine when to stop?

objective_stop

Should we use convergence of objective value to determine when to stop?

Value

Design matrix.

Response vector.

lam

Vector of penalty parameters.

family

Family: "gaussian" or "binomial".

active_set

Set of non-zero coefficients in randomized solution that were penalized. Integers from 1:p.

inactive_set

Set of zero coefficients in randomized solution. Integers from 1:p.

unpenalized_set

Set of non-zero coefficients in randomized solution that were not penalized. Integers from 1:p.

sign_soln

The sign pattern of the randomized solution.

full_law

List describing sampling parameters for conditional law of all optimization variables given the data in the LASSO problem.

conditional_law

List describing sampling parameters for conditional law of only the scaling variables given the data and the observed subgradient in the LASSO problem.

internal_transform

Affine transformation describing relationship between internal representation of the data and the data compontent of score of the likelihood at the unregularized MLE based on the sign_vector (a.k.a. relaxed LASSO).

observed_raw

Data component of the score at the unregularized MLE.

noise_scale

SD of Gaussian noise used to draw the perturbed objective.

soln

The randomized solution. Inference is made conditional on its sign vector (so no more snooping of this value is formally permitted.) If condition_subgrad == TRUE when sampling, then we may snoop on the observed subgradient.

perturb

The random vector in the linear term added to the objective.

Details

For family="gaussian" this function uses the "standard" lasso objective $$ 1/2 \|y - x \beta\|_2^2 + \lambda \|\beta\|_1 $$ and adds a term $$ - \omega^T\beta + \frac{\epsilon}{2} \|\beta\|^2_2 $$ where omega is drawn from IID normals with standard deviation noise_scale and epsilon given by ridge_term. See below for default values of noise_scale and ridge_term.

For family="binomial", the squared error loss is replaced by the negative of the logistic log-likelihood.

References

Xiaoying Tian, and Jonathan Taylor (2015). Selective inference with a randomized response. arxiv.org:1507.06739

Xiaoying Tian, Snigdha Panigrahi, Jelena Markovic, Nan Bi and Jonathan Taylor (2016). Selective inference after solving a convex problem. arxiv:1609.05609

Examples

Run this code

# NOT RUN {
set.seed(43)
n = 50
p = 10
sigma = 0.2
lam = 0.5

X = matrix(rnorm(n*p), n, p)
X = scale(X, TRUE, TRUE) / sqrt(n-1)

beta = c(3,2,rep(0,p-2))
y = X%*%beta + sigma*rnorm(n)

result = randomizedLasso(X, y, lam)

# }

Run the code above in your browser using DataLab