Learn R Programming

l2boost (version 1.0.3)

cv.l2boost: K-fold cross-validation using l2boost.

Description

Calculate the K-fold cross-validation prediction error for l2boost models. The prediction error is calculated using mean squared error (MSE). The optimal boosting step (m=opt.step) is obtained by selecting the step m resulting in the minimal MSE.

Usage

cv.l2boost(
  x,
  y,
  K = 10,
  M = NULL,
  nu = 1e-04,
  lambda = NULL,
  trace = FALSE,
  type = c("discrete", "hybrid", "friedman", "lars"),
  cores = NULL,
  ...
)

Arguments

x

the design matrix

y

the response vector

K

number of cross-validation folds (default: 10)

M

the total number of iterations passed to l2boost.

nu

l1 shrinkage parameter (default: 1e-4)

lambda

l2 shrinkage parameter for elasticBoost (default: NULL = no l2-regularization)

trace

Show computation/debugging output? (default: FALSE)

type

Type of l2boost fit with (default: discrete) see l2boost for description.

cores

number of cores to parallel the cv analysis. If not specified, detects the number of cores. If more than 1 core, use n-1 for cross-validation. Implemented using multicore (mclapply), or clusterApply on Windows machines.

...

Additional arguments passed to l2boost

Value

A list of cross-validation results:

call

the matched call.

type

Choice of l2boost algorithm from "discrete", "hybrid", "friedman","lars". see l2boost

names

design matrix column names used in the model

nu

The L1 boosting shrinkage parameter value

lambda

The L2 elasticBoost shrinkage parameter value

K

number of folds used for cross-validation

mse

Optimal cross-validation mean square error estimate

mse.list

list of K vectors of mean square errors at each step m

coef

beta coefficient estimates from the full model at opt.step

coef.stand

standardized beta coefficient estimates from full model at opt.step

opt.step

optimal step m calculated by minimizing cross-validation error among all K training sets

opt.norm

L1 norm of beta coefficients at opt.step

fit

l2boost fit of full model

yhat

estimate of response from full model at opt.step

Details

The cross-validation method splits the test data set into K mutually exclusive subsets. An l2boost model is built on K different training data sets, each created from a subsample of the full data set by sequentially leaving out one of the K subsets. The prediction error estimate is calculated by averaging the mean square error of each K test sets of the all of the K training datasets. The optimal step m is obtained at the step with a minimal averaged mean square error.

The full l2boost model is run after the cross-validation models, on the full data set. This model is run for the full number of iteration steps M and returned in the cv.l2boost$fit object.

cv.l2boost only optimizes along the iteration count m for a given value of nu. This is equivalent to an L1-regularization optimization. In order to optimize an elasticBoost model on the L2-regularization parameter lambda, a manual two way cross-validation can be obtained by sequentially optimizing over a range of lambda values, and selecting the lambda/opt.step pair resulting in the minimal cross-validated mean square error. See the examples below.

cv.l2boost uses the parallel package internally to speed up the cross-validation process on multicore machines. Parallel is packaged with base R >= 2.14, for earlier releases the multicore package provides the same functionality. By default, cv.l2boost will use all cores available except 1. Each fold is run on it's own core and results are combined automatically. The number of cores can be overridden using the cores function argument.

See Also

l2boost, plot.l2boost, predict.l2boost and coef.l2boost

Examples

Run this code
# NOT RUN {
#--------------------------------------------------------------------------
# Example: ElasticBoost simulation
# Compare l2boost and elasticNetBoosting using 10-fold CV
# 
# Elastic net simulation, see Zou H. and Hastie T. Regularization and 
# variable selection via the elastic net. J. Royal Statist. Soc. B, 
# 67(2):301-320, 2005
set.seed(1025)
dta <- elasticNetSim(n=100)

# The default values set up the signal on 3 groups of 5 variables,
# Color the signal variables red, others are grey.
sig <- c(rep("red", 15), rep("grey", 40-15))

# Set the boosting parameters
Mtarget = 1000
nuTarget = 1.e-2

# For CRAN, only use 2 cores in the CV method
cvCores=2

# 10 fold l2boost CV  
cv.obj <- cv.l2boost(dta$x,dta$y,M=Mtarget, nu=nuTarget, cores=cvCores)

# Plot the results
par(mfrow=c(2,3))
plot(cv.obj)
abline(v=cv.obj$opt.step, lty=2, col="grey")
plot(cv.obj$fit, type="coef", ylab=expression(beta[i]))
abline(v=cv.obj$opt.step, lty=2, col="grey")
plot(coef(cv.obj$fit, m=cv.obj$opt.step), cex=.5, 
  ylab=expression(beta[i]), xlab="Column Index", ylim=c(0,140), col=sig)

# elasticBoost l1-regularization parameter lambda=0.1 
# 5 fold elasticNet CV
cv.eBoost <- cv.l2boost(dta$x,dta$y,M=Mtarget, K=5, nu=nuTarget, lambda=.1, cores=cvCores) 

# plot the results
plot(cv.eBoost)
abline(v=cv.eBoost$opt.step, lty=2, col="grey")
plot(cv.eBoost$fit, type="coef", ylab=expression(beta[i]))
abline(v=cv.eBoost$opt.step, lty=2, col="grey")
plot(coef(cv.eBoost$fit, m=cv.obj$opt.step), cex=.5, 
  ylab=expression(beta[i]), xlab="Column Index", ylim=c(0,140), col=sig)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab