Provide natural lasso estimate (of the error standard deviation) using cross-validation to select the tuning parameter value The output also includes the cross-validation result of the naive estimate and the degree of freedom adjusted estimate of the error standard deviation.
nlasso_cv(x, y, lambda = NULL, intercept = TRUE, nlam = 100,
flmin = 0.01, nfold = 5, foldid = NULL, thresh = 1e-08,
glmnet_output = NULL)
An n
by p
design matrix. Each row is an observation of p
features.
A response vector of size n
.
A user specified list of tuning parameter. Default to be NULL, and the program will compute its own lambda
path based on nlam
and flmin
.
Indicator of whether intercept should be fitted. Default to be TRUE
.
The number of lambda
values. Default value is 100
.
The ratio of the smallest and the largest values in lambda
. The largest value in lambda
is usually the smallest value for which all coefficients are set to zero. Default to be 1e-2
.
Number of folds in cross-validation. Default value is 5. If each fold gets too view observation, a warning is thrown and the minimal nfold = 3
is used.
A vector of length n
representing which fold each observation belongs to. Default to be NULL
, and the program will generate its own randomly.
Threshold value for underlying optimization algorithm to claim convergence. Default to be 1e-8
.
Should the estimate be computed using a user-specified output from cv.glmnet
. If not NULL
, it should be the output from cv.glmnet
call with standardize = TRUE
and keep = TRUE
, and then the arguments lambda
, intercept
, nlam
, flmin
, nfold
, foldid
, and thresh
will be ignored. Default to be NULL
, in which case the function will call cv.glmnet
internally.
A list object containing:
n
and p
: The dimension of the problem.
lambda
: The path of tuning parameter used.
beta
: Estimate of the regression coefficients, in the original scale, corresponding to the tuning parameter selected by cross-validation.
a0
: Estimate of intercept
mat_mse
: The estimated prediction error on the test sets in cross-validation. A matrix of size nlam
by nfold
. If glmnet_output
is not NULL
, then mat_mse
will be NULL.
cvm
: The averaged estimated prediction error on the test sets over K folds.
cvse
: The standard error of the estimated prediction error on the test sets over K folds.
ibest
: The index in lambda
that attains the minimal mean cross-validated error.
foldid
: Fold assignment. A vector of length n
.
nfold
: The number of folds used in cross-validation.
sig_obj
: Natural lasso estimate of standard deviation of the error, with the optimal tuning parameter selected by cross-validation.
sig_obj_path
: Natural lasso estimates of standard deviation of the error. A vector of length nlam
.
sig_naive
: Naive estimates of the error standard deviation based on lasso regression, i.e., \(||y - X \hat{\beta}||_2 / \sqrt n\), selected by cross-validation.
sig_naive_path
: Naive estimate of standard deviation of the error based on lasso regression. A vector of length nlam
.
sig_df
: Degree-of-freedom adjusted estimate of standard deviation of the error, selected by cross-validation. See Reid, et, al (2016).
sig_df_path
: Degree-of-freedom adjusted estimate of standard deviation of the error. A vector of length nlam
.
type
: whether the output is of a natural or an organic lasso.
# NOT RUN {
set.seed(123)
sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1)
nl_cv <- nlasso_cv(x = sim$x, y = sim$y[, 1])
# }
Run the code above in your browser using DataLab