Learn R Programming

natural (version 0.9.0)

nlasso_cv: Cross-validation for natural lasso

Description

Provide natural lasso estimate (of the error standard deviation) using cross-validation to select the tuning parameter value The output also includes the cross-validation result of the naive estimate and the degree of freedom adjusted estimate of the error standard deviation.

Usage

nlasso_cv(x, y, lambda = NULL, intercept = TRUE, nlam = 100,
  flmin = 0.01, nfold = 5, foldid = NULL, thresh = 1e-08,
  glmnet_output = NULL)

Arguments

x

An n by p design matrix. Each row is an observation of p features.

y

A response vector of size n.

lambda

A user specified list of tuning parameter. Default to be NULL, and the program will compute its own lambda path based on nlam and flmin.

intercept

Indicator of whether intercept should be fitted. Default to be TRUE.

nlam

The number of lambda values. Default value is 100.

flmin

The ratio of the smallest and the largest values in lambda. The largest value in lambda is usually the smallest value for which all coefficients are set to zero. Default to be 1e-2.

nfold

Number of folds in cross-validation. Default value is 5. If each fold gets too view observation, a warning is thrown and the minimal nfold = 3 is used.

foldid

A vector of length n representing which fold each observation belongs to. Default to be NULL, and the program will generate its own randomly.

thresh

Threshold value for underlying optimization algorithm to claim convergence. Default to be 1e-8.

glmnet_output

Should the estimate be computed using a user-specified output from cv.glmnet. If not NULL, it should be the output from cv.glmnet call with standardize = TRUE and keep = TRUE, and then the arguments lambda, intercept, nlam, flmin, nfold, foldid, and thresh will be ignored. Default to be NULL, in which case the function will call cv.glmnet internally.

Value

A list object containing:

n and p:

The dimension of the problem.

lambda:

The path of tuning parameter used.

beta:

Estimate of the regression coefficients, in the original scale, corresponding to the tuning parameter selected by cross-validation.

a0:

Estimate of intercept

mat_mse:

The estimated prediction error on the test sets in cross-validation. A matrix of size nlam by nfold. If glmnet_output is not NULL, then mat_mse will be NULL.

cvm:

The averaged estimated prediction error on the test sets over K folds.

cvse:

The standard error of the estimated prediction error on the test sets over K folds.

ibest:

The index in lambda that attains the minimal mean cross-validated error.

foldid:

Fold assignment. A vector of length n.

nfold:

The number of folds used in cross-validation.

sig_obj:

Natural lasso estimate of standard deviation of the error, with the optimal tuning parameter selected by cross-validation.

sig_obj_path:

Natural lasso estimates of standard deviation of the error. A vector of length nlam.

sig_naive:

Naive estimates of the error standard deviation based on lasso regression, i.e., \(||y - X \hat{\beta}||_2 / \sqrt n\), selected by cross-validation.

sig_naive_path:

Naive estimate of standard deviation of the error based on lasso regression. A vector of length nlam.

sig_df:

Degree-of-freedom adjusted estimate of standard deviation of the error, selected by cross-validation. See Reid, et, al (2016).

sig_df_path:

Degree-of-freedom adjusted estimate of standard deviation of the error. A vector of length nlam.

type:

whether the output is of a natural or an organic lasso.

See Also

nlasso_path

Examples

Run this code
# NOT RUN {
set.seed(123)
sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1)
nl_cv <- nlasso_cv(x = sim$x, y = sim$y[, 1])
# }

Run the code above in your browser using DataLab