flamCV: Fit the Fused Lasso Additive Model and Do Tuning Parameter Selection using K-Fold Cross-Validation

Description

Fit an additive model where each component is estimated to piecewise constant with a small number of adaptively-chosen knots. Tuning parameter selection is done using K-fold cross-validation. In particular, this function implements the "fused lasso additive model", as proposed in Petersen, A., Witten, D., and Simon, N. (2014). Fused Lasso Additive Model. arXiv preprint arXiv:1409.5391.

Usage

flamCV(x, y, lambda.min.ratio = 0.01, n.lambda = 50, lambda.seq = NULL,
alpha = 1, family = "gaussian", method = "BCD", fold = NULL,
n.fold = NULL, seed = NULL, within1SE = T, tolerance = 10e-6)

Arguments

n x p covariate matrix. May have p > n.

n-vector containing the outcomes for the n observations in x.

lambda.min.ratio

smallest value for lambda.seq, as a fraction of the maximum lambda value, which is the data-derived smallest value for which all estimated functions are zero. The default is 0.01.

n.lambda

the number of lambda values to consider - the default is 50.

lambda.seq

a user-supplied sequence of positive lambda values to consider. The typical usage is to calculate lambda.seq using lambda.min.ratio and n.lambda, but providing lambda.seq overrides this. If provided, lambda.seq should be a decreasing sequence of values, since flamCV relies on warm starts for speed. Thus fitting the model for a whole sequence of lambda values is often faster than fitting for a single lambda value.

alpha

the value of the tuning parameter alpha to consider - default is 1. Value must be in [0,1] with values near 0 prioritizing sparsity of functions and values near 1 prioritizing limiting the number of knots. Empirical evidence suggests using alpha of 1 when p < n and alpha of 0.75 when p > n.

family

specifies the loss function to use. Currently supports squared error loss (default; family="gaussian") and logistic loss (family="binomial").

method

specifies the optimization algorithm to use. Options are block-coordinate descent (default; method="BCD"), generalized gradient descent (method="GGD"), or generalized gradient descent with backtracking (method="GGD.backtrack"). This argument is ignored if family="binomial".

fold

user-supplied fold numbers for cross-validation. If supplied, fold should be an n-vector with entries in 1,...,K when doing K-fold cross-validation. The default is to choose fold using n.fold.

n.fold

the number of folds, K, to use for the K-fold cross-validation selection of tuning parameters. The default is 10 - specification of fold overrides use of n.fold.

seed

an optional number used with set.seed() at the beginning of the function. This is only relevant if fold is not specified by the user.

within1SE

logical (TRUE or FALSE) for how cross-validated tuning parameters should be chosen. If within1SE=TRUE, lambda is chosen to be the value corresponding to the most sparse model with cross-validation error within one standard error of the minimum cross-validation error. If within1SE=FALSE, lambda is chosen to be the value corresponding to the minimum cross-validation error.

tolerance

specifies the convergence criterion for the objective (default is 10e-6).

Value

An object with S3 class "flamCV".

mean.cv.error

m-vector containing cross-validation error where m is the length of lambda.seq. Note that mean.cv.error[i] contains the cross-validation error for tuning parameters alpha and flam.out$all.lambda[i].

se.cv.error

m-vector containing cross-validation standard error where m is the length of lambda.seq. Note that se.cv.error[i] contains the standard error of the cross-validation error for tuning parameters alpha and flam.out$all.lambda[i].

lambda.cv

optimal lambda value chosen by cross-validation.

alpha