Learn R Programming

bigstatsr (version 0.6.2)

COPY_biglasso_main: Sparse regression path

Description

Fit solution paths for linear or logistic regression models penalized by lasso (alpha = 1) or elastic-net (1e-4 < alpha < 1) over a grid of values for the regularization parameter lambda.

Usage

COPY_biglasso_main(X, y.train, ind.train, ind.col, covar.train,
  family = c("gaussian", "binomial"), alphas = 1, K = 10,
  ind.sets = sample(rep_len(1:K, n)), nlambda = 200, lambda.min = if
  (n > p) 1e-04 else 0.001, nlam.min = 50, n.abort = 10,
  base.train = NULL, eps = 1e-05, max.iter = 1000, dfmax = 50000,
  warn = FALSE, return.all = FALSE, ncores = 1)

Arguments

family

Either "gaussian" (linear) or "binomial" (logistic).

alphas

The elastic-net mixing parameter that controls the relative contribution from the lasso (l1) and the ridge (l2) penalty. The penalty is defined as $$ \alpha||\beta||_1 + (1-\alpha)/2||\beta||_2^2.$$ alpha = 1 is the lasso penalty and alpha in between 0 (1e-4) and 1 is the elastic-net penalty. Default is 1. You can pass multiple values, and only one will be used (optimized by grid-search).

K

Number of sets used in the Cross-Model Selection and Averaging (CMSA) procedure. Default is 10.

ind.sets

Integer vectors of values between 1 and K specifying which set each index of the training set is in. Default randomly assigns these values.

nlambda

The number of lambda values. Default is 200.

lambda.min

The smallest value for lambda, as a fraction of lambda.max. Default is .0001 if the number of observations is larger than the number of variables and .001 otherwise.

nlam.min

Minimum number of lambda values to investigate. Default is 50.

n.abort

Number of lambda values for which prediction on the validation set must decrease before stopping. Default is 10.

eps

Convergence threshold for inner coordinate descent. The algorithm iterates until the maximum change in the objective after any coefficient update is less than eps times the null deviance. Default value is 1e-5.

max.iter

Maximum number of iterations. Default is 1000.

dfmax

Upper bound for the number of nonzero coefficients. Default is 50e3 because, for large data sets, computational burden may be heavy for models with a large number of nonzero coefficients.

warn

Return warning messages for failures to converge and model saturation? Default is FALSE.

return.all

Whether to return coefficients for all alpha and lambda values. Default is FALSE and returns only coefficients which maximize prediction on the validation sets.

Details

The objective function for linear regression (family = "gaussian") is $$\frac{1}{2n}\textrm{RSS} + \textrm{penalty},$$ for logistic regression (family = "binomial") it is $$-\frac{1}{n} loglike + \textrm{penalty}.$$