Learn R Programming

gcdnet (version 1.0.6)

cv.gcdnet: Cross-validation for gcdnet

Description

Does k-fold cross-validation for gcdnet, produces a plot, and returns a value for lambda. This function is modified based on the cv function from the glmnet package.

Usage

cv.gcdnet(
  x,
  y,
  lambda = NULL,
  pred.loss = c("misclass", "loss"),
  nfolds = 5,
  foldid,
  delta = 2,
  omega = 0.5,
  ...
)

Value

an object of class cv.gcdnet is returned, which is a list with the ingredients of the cross-validation fit.

lambda

the values of lambda used in the fits.

cvm

the mean cross-validated error - a vector of length length(lambda).

cvsd

estimate of standard error of cvm.

cvupper

upper curve = cvm+cvsd.

cvlower

lower curve = cvm-cvsd.

nzero

number of non-zero coefficients at each lambda.

name

a text string indicating type of measure (for plotting purposes).

gcdnet.fit

a fitted gcdnet object for the full data.

lambda.min

The optimal value of lambda that gives minimum cross validation error cvm.

lambda.1se

The largest value of lambda such that error is within 1 standard error of the minimum.

Arguments

x

x matrix as in gcdnet.

y

response variable or class label y as in gcdnet.

lambda

optional user-supplied lambda sequence; default is NULL, and gcdnet chooses its own sequence.

pred.loss

loss function to use for cross-validation error. Valid options are:

  • "loss" Margin based loss function. When use least square loss "ls", it gives mean square error (MSE). When use expectile regression loss "er", it gives asymmetric mean square error (AMSE).

  • "misclass" only available for classification: it gives misclassification error.

Default is "loss".

nfolds

number of folds - default is 5. Although nfolds can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Smallest value allowable is nfolds=3.

foldid

an optional vector of values between 1 and nfold identifying what fold each observation is in. If supplied, nfold can be missing.

delta

parameter \(\delta\) only used in HHSVM for computing margin based loss function, only available for pred.loss = "loss".

omega

parameter \(\omega\) only used in expectile regression. Only available for pred.loss = "loss".

...

other arguments that can be passed to gcdnet.

Author

Yi Yang, Yuwen Gu and Hui Zou
Maintainer: Yi Yang <yi.yang6@mcgill.ca>

Details

The function runs gcdnet nfolds+1 times; the first to get the lambda sequence, and then the remainder to compute the fit with each of the folds omitted. The average error and standard deviation over the folds are computed.

References

Yang, Y. and Zou, H. (2012). "An Efficient Algorithm for Computing The HHSVM and Its Generalizations." Journal of Computational and Graphical Statistics, 22, 396-415.
BugReport: https://github.com/emeryyi/gcdnet

Gu, Y., and Zou, H. (2016). "High-dimensional generalizations of asymmetric least squares regression and their applications." The Annals of Statistics, 44(6), 2661–2694.

Friedman, J., Hastie, T., and Tibshirani, R. (2010). "Regularization paths for generalized linear models via coordinate descent." Journal of Statistical Software, 33, 1.
https://www.jstatsoft.org/v33/i01/

See Also

gcdnet, plot.cv.gcdnet, predict.cv.gcdnet, and coef.cv.gcdnet methods.

Examples

Run this code

# fit an elastic net penalized HHSVM with lambda2 = 0.1 for the L2 penalty.
# Use the misclassification rate as the cross validation prediction loss.
# Use five-fold CV to choose the optimal lambda for the L1 penalty.

data(FHT)
set.seed(2011)
cv <- cv.gcdnet(FHT$x, FHT$y, method = "hhsvm",
                lambda2 = 0.1, pred.loss = "misclass",
                nfolds = 5, delta = 1.5)
plot(cv)

# fit an elastic net penalized least squares
# with lambda2 = 0.1 for the L2 penalty. Use the
# least square loss as the cross validation
# prediction loss. Use five-fold CV to choose
# the optimal lambda for the L1 penalty.

set.seed(2011)
cv1 <- cv.gcdnet(FHT$x, FHT$y_reg, method ="ls",
                 lambda2 = 0.1, pred.loss = "loss",
                 nfolds = 5)
plot(cv1)

# To fit a LASSO penalized logistic regression
# we set lambda2 = 0 to disable the L2 penalty. Use the
# logistic loss as the cross validation
# prediction loss. Use five-fold CV to choose
# the optimal lambda for the L1 penalty.

set.seed(2011)
cv2 <- cv.gcdnet(FHT$x, FHT$y, method ="logit",
                 lambda2 = 0, pred.loss="loss",
                 nfolds=5)
plot(cv2)

Run the code above in your browser using DataLab