Learn R Programming

flam (version 3.2)

flamCV: Fit the Fused Lasso Additive Model and Do Tuning Parameter Selection using K-Fold Cross-Validation

Description

Fit an additive model where each component is estimated to piecewise constant with a small number of adaptively-chosen knots. Tuning parameter selection is done using K-fold cross-validation. In particular, this function implements the "fused lasso additive model", as proposed in Petersen, A., Witten, D., and Simon, N. (2014). Fused Lasso Additive Model. arXiv preprint arXiv:1409.5391.

Usage

flamCV(x, y, lambda.min.ratio = 0.01, n.lambda = 50, lambda.seq = NULL,
alpha = 1, family = "gaussian", method = "BCD", fold = NULL,
n.fold = NULL, seed = NULL, within1SE = T, tolerance = 10e-6)

Arguments

x

n x p covariate matrix. May have p > n.

y

n-vector containing the outcomes for the n observations in x.

lambda.min.ratio

smallest value for lambda.seq, as a fraction of the maximum lambda value, which is the data-derived smallest value for which all estimated functions are zero. The default is 0.01.

n.lambda

the number of lambda values to consider - the default is 50.

lambda.seq

a user-supplied sequence of positive lambda values to consider. The typical usage is to calculate lambda.seq using lambda.min.ratio and n.lambda, but providing lambda.seq overrides this. If provided, lambda.seq should be a decreasing sequence of values, since flamCV relies on warm starts for speed. Thus fitting the model for a whole sequence of lambda values is often faster than fitting for a single lambda value.

alpha

the value of the tuning parameter alpha to consider - default is 1. Value must be in [0,1] with values near 0 prioritizing sparsity of functions and values near 1 prioritizing limiting the number of knots. Empirical evidence suggests using alpha of 1 when p < n and alpha of 0.75 when p > n.

family

specifies the loss function to use. Currently supports squared error loss (default; family="gaussian") and logistic loss (family="binomial").

method

specifies the optimization algorithm to use. Options are block-coordinate descent (default; method="BCD"), generalized gradient descent (method="GGD"), or generalized gradient descent with backtracking (method="GGD.backtrack"). This argument is ignored if family="binomial".

fold

user-supplied fold numbers for cross-validation. If supplied, fold should be an n-vector with entries in 1,...,K when doing K-fold cross-validation. The default is to choose fold using n.fold.

n.fold

the number of folds, K, to use for the K-fold cross-validation selection of tuning parameters. The default is 10 - specification of fold overrides use of n.fold.

seed

an optional number used with set.seed() at the beginning of the function. This is only relevant if fold is not specified by the user.

within1SE

logical (TRUE or FALSE) for how cross-validated tuning parameters should be chosen. If within1SE=TRUE, lambda is chosen to be the value corresponding to the most sparse model with cross-validation error within one standard error of the minimum cross-validation error. If within1SE=FALSE, lambda is chosen to be the value corresponding to the minimum cross-validation error.

tolerance

specifies the convergence criterion for the objective (default is 10e-6).

Value

An object with S3 class "flamCV".

mean.cv.error

m-vector containing cross-validation error where m is the length of lambda.seq. Note that mean.cv.error[i] contains the cross-validation error for tuning parameters alpha and flam.out$all.lambda[i].

se.cv.error

m-vector containing cross-validation standard error where m is the length of lambda.seq. Note that se.cv.error[i] contains the standard error of the cross-validation error for tuning parameters alpha and flam.out$all.lambda[i].

lambda.cv

optimal lambda value chosen by cross-validation.

alpha

as specified by user (or default).

index.cv

index of the model corresponding to 'lambda.cv'.

flam.out

object of class 'flam' returned by flam.

fold

as specified by user (or default).

n.folds

as specified by user (or default).

within1SE

as specified by user (or default).

tolerance

as specified by user (or default).

call

matched call.

Details

Note that flamCV does not cross-validate over alpha - just a single value should be provided. However, if the user would like to cross-validate over alpha, then flamCV should be called multiple times for different values of alpha and the same seed. This ensures that the cross-validation folds (fold) remain the same for the different values of alpha. See the example below for details.

References

Petersen, A., Witten, D., and Simon, N. (2014). Fused Lasso Additive Model. arXiv preprint arXiv:1409.5391.

See Also

flam, plot.flamCV, summary.flamCV

Examples

Run this code
# NOT RUN {
#See ?'flam-package' for a full example of how to use this package

#generate data
set.seed(1)
data <- sim.data(n = 50, scenario = 1, zerof = 10, noise = 1)

#fit model for a range of lambda chosen by default
#pick lambda using 2-fold cross-validation
#note: use larger 'n.fold' (e.g., 10) in practice
flamCV.out <- flamCV(x = data$x, y = data$y, alpha = 0.75, n.fold = 2)

# }
# NOT RUN {
#note that cross-validation is only done to choose lambda for specified alpha
#to cross-validate over alpha also, call 'flamCV' for several alpha and set seed
#note: use larger 'n.fold' (e.g., 10) in practice
flamCV.out1 <- flamCV(x = data$x, y = data$y, alpha = 0.65, seed = 100, 
	within1SE = FALSE, n.fold = 2)
flamCV.out2 <- flamCV(x = data$x, y = data$y, alpha = 0.75, seed = 100, 
	within1SE = FALSE, n.fold = 2)
flamCV.out3 <- flamCV(x = data$x, y = data$y, alpha = 0.85, seed = 100, 
	within1SE = FALSE, n.fold = 2)
#this ensures that the folds used are the same
flamCV.out1$fold; flamCV.out2$fold; flamCV.out3$fold
#compare the CV error for the optimum lambda of each alpha to choose alpha
CVerrors <- c(flamCV.out1$mean.cv.error[flamCV.out1$index.cv], 
	flamCV.out2$mean.cv.error[flamCV.out2$index.cv], 
	flamCV.out3$mean.cv.error[flamCV.out3$index.cv])
best.alpha <- c(flamCV.out1$alpha, flamCV.out2$alpha, 
	flamCV.out3$alpha)[which(CVerrors==min(CVerrors))]

#also can generate data for logistic FLAM model
data2 <- sim.data(n = 50, scenario = 1, zerof = 10, family = "binomial")
#fit the FLAM model with cross-validation using logistic loss
#note: use larger 'n.fold' (e.g., 10) in practice
flamCV.logistic.out <- flamCV(x = data2$x, y = data2$y, family = "binomial",
	n.fold = 2)
# }
# NOT RUN {
#'flamCV' returns an object of the class 'flamCV' that includes an object
#of class 'flam' (flam.out); see ?'flam-package' for an example using S3
#methods for the classes of 'flam' and 'flamCV'
# }

Run the code above in your browser using DataLab