Learn R Programming

⚠️There's a newer version (3.15.0) of this package.Take me there.

Regularization Paths for SCAD and MCP Penalized Regression Models

ncvreg fits regularization paths for linear regression, GLM, and Cox regression models using lasso or nonconvex penalties, in particular the minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) penalty, with options for additional L2 penalties (the "elastic net" idea). Utilities for carrying out cross-validation as well as post-fitting visualization, summarization, inference, and prediction are also provided.

Basic Usage

The basic usage of ncvreg is as follows:

fit <- ncvreg(X, y)

The default penalty here is the minimax concave penalty (MCP), but SCAD and lasso penalties are also available. This produces a path of coefficients, which we can plot with

plot(fit)

Notice that variables enter the model one at a time, and that at any given value of lambda, several coefficients are zero. The summary method can be used for post-selection summarization and inference:

summary(fit, lambda=0.05)

# MCP-penalized linear regression with n=97, p=8
# At lambda=0.0500:
# -------------------------------------------------
#   Nonzero coefficients: 6
#   Expected nonzero coefficients: 2.51
#   Average mfdr (6 features)    : 0.418

summary(fit) also returns the following table:

Estimatezmfdr
lcavol0.53178998.8804290.0000000
svi0.67256103.9450520.0018967
lweight0.60389693.6658740.0050683
lbph0.08874561.9282410.4998035
age-0.0153092-1.7883341.0000000
pgg450.00168041.1597721.0000000

In this case, it would appear that lcavol, svi, and lweight are clearly associated with the response, even after adjusting for the other variables in the model, while lbph, age, and pgg45 may be false positives included simply by chance.

Typically, one would carry out cross-validation for the purposes of assessing the predictive accuracy of the model at various values of lambda:

cvfit <- cv.ncvreg(X, y)
plot(cvfit)

At this point, coef(cvfit) will return the coefficients at the value of lambda minimizing the cross-validation error. Likewise,

predict(cvfit, X=head(X))

will return predictions for that model, while

predict(cvfit, type="nvars")

will return the number of nonzero coefficients. Note that the original fit (to the full data set) is returned as cvfit$fit; it is not necessary to call both ncvreg and cv.ncvreg to analyze a data set. For example, plot(cvfit$fit) will produce the same coefficient path plot as plot(fit) above.

Documentation and Citation

For more on the usage and syntax of ncvreg, see the ncvreg homepage.

For more on the algorithms used by ncvreg, see the original article:

For more about the marginal false discovery rate idea used for post-selection inference, see

Installation

  • To install the latest release version from CRAN: install.packages("ncvreg")
  • To install the latest development version from GitHub: devtools::install_github("pbreheny/ncvreg")

Copy Link

Version

Install

install.packages('ncvreg')

Monthly Downloads

3,750

Version

3.11-1

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Patrick Breheny

Last Published

February 26th, 2019

Functions in ncvreg (3.11-1)

predict.ncvreg

Model predictions based on a fitted ncvreg object.
perm.ncvreg

Permutation fitting for ncvreg
ncvsurv

Fit an MCP- or SCAD-penalized survival model
predict.ncvsurv

Model predictions based on a fitted "ncvsurv" object.
plot.ncvsurv.func

Plot survival curve for ncvsurv model
summary.cv.ncvreg

Summarizing cross-validation-based inference
permres

Permute residuals for a fitted ncvreg model
plot.cv.ncvreg

Plots the cross-validation curve from a cv.ncvreg object
summary.ncvreg

Summary method for ncvreg objects
ncvreg

Fit an MCP- or SCAD-penalized regression path
std

Standardizes a design matrix
plot.ncvreg

Plot coefficients from a ncvreg object
plot.mfdr

Plot marginal false discovery rate curves
Prostate

Factors associated with prostate specific antigen
AUC.cv.ncvsurv

Calculates AUC for cv.ncvsurv objects
mfdr

Marginal false discovery rates
ncvreg-internal

Internal ncvreg functions
cv.ncvreg

Cross-validation for ncvreg/ncvsurv
fir

Marginal false discovery rates
Heart

Risk factors associated with heart disease
Lung

VA lung cancer data set
ncvreg-package

Regularization paths for SCAD- and MCP-penalized regression models