Performs randomization tests of features identified by the Lasso
feature.test(
x,
y,
B = 100,
type.measure = "deviance",
s = "lambda.min",
keeplambda = FALSE,
olsestimates = TRUE,
penalty.factor = rep(1, nvars),
alpha = 1,
control = list(trace = FALSE, maxcores = 24),
...
)
input matrix, of dimension nobs x nvars; each row is an observation vector.
quantitative response variable of length nobs
The number of randomizations used in the computations
loss to use for cross-validation. See cv.glmnet
for more information
Value of the penalty parameter 'lambda' at which predictions are
required. Default is the entire sequence used to create the model. See
coef.glmnet
for more information
If set to TRUE
then the estimated lambda from cross
validation from the original dataset is kept and used for evaluation in the
subsequent randomization datasets. This reduces computation time
substantially as it is not necessary to perform cross validation for each
randomization. If set to a value then that value is used for the value of
lambda. Defaults to FALSE
Logical. Should the test statistic be based on OLS
estimates from the model based on the variables selected by the lasso.
Defaults to TRUE
. If set to FALSE
then the coefficients from
the lasso is used as test statistics.
a vector of weights used for adaptive lasso. See
glmnet
for more information.
The elasticnet mixing parameter. See glmnet
for more
information.
A list of options that control the algorithm. Currently
trace
is a logical and if set to TRUE
then the function
produces more output. maxcores
sets the maximum number of cores to
use with the parallel
package
Other arguments passed to glmnet
Returns a list of 7 variables:
The p-value for the test of the full set of variables selected by the lasso (based on the OLS estimates)
A vector of the indices of the non-zero
variables selected by glmnet
sorted from (numerically) highest to
lowest based on their ols test statistic.
The p-value for the maximum of the OLS test statistics
A vector of
the indices of the non-zero variables selected by glmnet
sorted from
(numerically) highest to lowest based on their absolute lasso coefficients.
The p-value for the maximum of the lasso test statistics
The value of lambda used in the computations
The number of permutations used
Brink-Jensen, K and Ekstrom, CT 2014. Inference for feature selection using the Lasso with high-dimensional data. http://arxiv.org/abs/1403.4296
glmnet
# NOT RUN {
# Simulate some data
x <- matrix(rnorm(30*100), nrow=30)
y <- rnorm(30, mean=1*x[,1])
# Make inference for features
# }
# NOT RUN {
feature.test(x, y)
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab