feature.test: Inference for features identified by the Lasso

Description

Performs randomization tests of features identified by the Lasso

Usage

feature.test(
  x,
  y,
  B = 100,
  type.measure = "deviance",
  s = "lambda.min",
  keeplambda = FALSE,
  olsestimates = TRUE,
  penalty.factor = rep(1, nvars),
  alpha = 1,
  control = list(trace = FALSE, maxcores = 24),
  ...
)

Arguments

input matrix, of dimension nobs x nvars; each row is an observation vector.

quantitative response variable of length nobs

The number of randomizations used in the computations

type.measure

loss to use for cross-validation. See cv.glmnet for more information

Value of the penalty parameter 'lambda' at which predictions are required. Default is the entire sequence used to create the model. See coef.glmnet for more information

keeplambda

If set to TRUE then the estimated lambda from cross validation from the original dataset is kept and used for evaluation in the subsequent randomization datasets. This reduces computation time substantially as it is not necessary to perform cross validation for each randomization. If set to a value then that value is used for the value of lambda. Defaults to FALSE

olsestimates

Logical. Should the test statistic be based on OLS estimates from the model based on the variables selected by the lasso. Defaults to TRUE. If set to FALSE then the coefficients from the lasso is used as test statistics.

penalty.factor

a vector of weights used for adaptive lasso. See glmnet for more information.

alpha

The elasticnet mixing parameter. See glmnet for more information.

control

A list of options that control the algorithm. Currently trace is a logical and if set to TRUE then the function produces more output. maxcores sets the maximum number of cores to use with the parallel package

…

Other arguments passed to glmnet

Value

Returns a list of 7 variables:

p.full

The p-value for the test of the full set of variables selected by the lasso (based on the OLS estimates)

ols.selected

A vector of the indices of the non-zero variables selected by glmnet sorted from (numerically) highest to lowest based on their ols test statistic.

p.maxols

The p-value for the maximum of the OLS test statistics

lasso.selected

A vector of the indices of the non-zero variables selected by glmnet sorted from (numerically) highest to lowest based on their absolute lasso coefficients.

p.maxlasso

The p-value for the maximum of the lasso test statistics

lambda.orig

The value of lambda used in the computations

The number of permutations used

References

Brink-Jensen, K and Ekstrom, CT 2014. Inference for feature selection using the Lasso with high-dimensional data. http://arxiv.org/abs/1403.4296

Examples

Run this code

# NOT RUN {

# Simulate some data
x <- matrix(rnorm(30*100), nrow=30)
y <- rnorm(30, mean=1*x[,1])

# Make inference for features
# }
# NOT RUN {
feature.test(x, y)
# }
# NOT RUN {

# }