Learn R Programming

selectiveInference (version 1.2.5)

ROSI: Relevant One-step Selective Inference for the LASSO

Description

Compute p-values and confidence intervals for the lasso estimate, at a fixed value of the tuning parameter lambda using the "relevant" conditioning event of arxiv.org/1801.09037.

Usage

ROSI(X, 
     y, 
     soln, 
     lambda, 
     penalty_factor=NULL, 
     dispersion=1,
     family=c('gaussian', 'binomial'), 
     solver=c('QP', 'glmnet'),
     construct_ci=TRUE, 
     debiasing_method=c("JM", "BN"),
     verbose=FALSE,
     level=0.9,
     use_debiased=TRUE)

Arguments

X

Matrix of predictors (n by p);

y

Vector of outcomes (length n)

soln

Estimated lasso coefficients (e.g., from glmnet). This is of length p (so the intercept is not included as the first component).

Be careful! This function uses the "standard" lasso objective $$ 1/2 \|y - X \beta\|_2^2 + \lambda \|\beta\|_1. $$ In contrast, glmnet multiplies the first term by a factor of 1/n. So after running glmnet, to extract the beta corresponding to a value lambda, you need to use beta = coef(obj, s=lambda/n)[-1], where obj is the object returned by glmnet (and [-1] removes the intercept, which glmnet always puts in the first component)

lambda

Value of lambda used to compute beta. See the above warning

penalty_factor

Penalty factor as used by glmnet. Actual penalty used in solving the problem is $$ \lambda \cdot \sum_{i=1}^p f_i |\beta_i| $$ with f being the penalty_factor. Defaults to vector of 1s.

dispersion

Estimate of dispersion in the GLM. Can be taken to be 1 for logisitic and should be an estimate of the error variance for the Gaussian.

family

Family used for likelihood.

solver

Solver used to solve restricted problems needed to find truncation set. Each active variable requires solving a new LASSO problem obtained by zeroing out one coordinate of original problem. The "QP" choice uses coordinate descent for a specific value of lambda, rather than glmnet which would solve for a new path each time.

construct_ci

Report confidence intervals or just p-values?

debiasing_method

Which method should be used for debiasing? Choices are "JM" (Javanmard, Montanari) or "BN" (method described in arxiv.org/1703.03282).

verbose

Print out progress along the way? Default is FALSE.

level

Confidence level for intervals.

use_debiased

Use the debiased estimate of the parameter or not. When FALSE, this is the method desribed in arxiv.org/1801.09037. The default TRUE often produces noticably shorter intervals and more powerful tests when p is comparable to n. Ignored when n<p and set to TRUE. Also note that with "BN" as debiasing method and n > p, this agrees with method in arxiv.org/1801.09037.

Value

active_set

Active set of LASSO.

pvalues

Two-sided P-values for active variables.

intervals

Confidence intervals

estimate

Relaxed (i.e. unshrunk) selected estimates.

std_err

Standard error of relaxed estimates (pre-selection).

dispersion

Dispersion parameter.

lower_trunc

Lower truncation point. The estimates should be outside the interval formed by the lower and upper truncation poitns.

upper_trunc

Lower truncation point. The estimates should be outside the interval formed by the lower and upper truncation poitns.

lambda

Value of tuning parameter lambda used.

penalty_factor

Penalty factor used for solving problem.

level

Confidence level.

call

The call to fixedLassoInf.

Details

???

References

Keli Liu, Jelena Markovic, Robert Tibshirani. More powerful post-selection inference, with application to the Lasso. arXiv:1801.09037

Tom Boot, Didier Nibbering. Inference in high-dimensional linear regression models. arXiv:1703.03282

Examples

Run this code
# NOT RUN {
# }
# NOT RUN {
library(selectiveInference)
library(glmnet)
set.seed(43)

n = 100
p = 200
s = 2
sigma = 1

x = matrix(rnorm(n*p),n,p)
x = scale(x,TRUE,TRUE)

beta = c(rep(10, s), rep(0,p-s)) / sqrt(n)
y = x %*% beta + sigma*rnorm(n)

# first run glmnet
gfit = glmnet(x,y,standardize=FALSE)

# extract coef for a given lambda; note the 1/n factor!
# (and we don't save the intercept term)
lambda = 4 * sqrt(n)
lambda_glmnet = 4 / sqrt(n)
beta = selectiveInference:::solve_problem_glmnet(x, 
                                                 y, 
                                                 lambda_glmnet, 
                                                 penalty_factor=rep(1, p),
                                                 family="gaussian")
# compute fixed lambda p-values and selection intervals
out = ROSI(x,
           y,
           beta,
           lambda,
           dispersion=sigma^2)
out

# an alternate approximate inverse from Boot and Nibbering

out = ROSI(x,
           y,
           beta,
           lambda,
           dispersion=sigma^2,
           debiasing_method="BN")
out
# }

Run the code above in your browser using DataLab