Learn R Programming

sdwd (version 1.0.5)

cv.sdwd: cross-validation for the sparse DWD

Description

Conducts a k-fold cross-validation for sdwd and returns the suggested values of the L1 parameter lambda.

Usage

cv.sdwd(x, y, lambda = NULL, pred.loss = c("misclass", "loss"), nfolds = 5, foldid, ...)

Arguments

x

A matrix of predictors, i.e., the x matrix used in sdwd.

y

A vector of binary class labels, i.e., the y used in sdwd.

lambda

Default is NULL, and the sequence generated by sdwd is used. User can also provide a new lambda sequence to use in cross-validation.

pred.loss

misclass for classification error, loss for DWD loss.

nfolds

The number of folds. Default value is 5. The allowable range is from 3 to the sample size. Larger nfolds needs more timing.

foldid

An optional vector with values between 1 and nfold, representing the folder indices for each observation. If supplied, nfold can be missing.

Other arguments that can be passed to sdwd.

Value

A cv.sdwd object is returned, which includes the cross-validation fit.

lambda

The lambda sequence used in sdwd.

cvm

A vector of length length(lambda) for the mean cross-validated error.

cvsd

A vector of length length(lambda) for estimates of standard error of cvm.

cvupper

The upper curve: cvm + cvsd.

cvlower

The lower curve: cvm - cvsd.

nzero

Numbers of non-zero coefficients at each lambda.

name

``Mis-classification error", for plotting purposes.

sdwd.fit

A fitted sdwd object using the full data.

lambda.min

The lambda incurring the minimum cross validation error cvm.

lambda.1se

The largest value of lambda such that error is within one standard error of the minimum.

cv.min

The minimum cross-validation error.

cv.1se

The cross-validation error associated with lambda.1se.

Details

This function runs sdwd to the sparse DWD by excluding every fold alternatively, and then computes the mean cross-validation error and the standard deviation. This function is modified based on the cv function from the gcdnet and the glmnet packages.

References

Wang, B. and Zou, H. (2016) ``Sparse Distance Weighted Discrimination", Journal of Computational and Graphical Statistics, 25(3), 826--838. https://www.tandfonline.com/doi/full/10.1080/10618600.2015.1049700

Yang, Y. and Zou, H. (2013) ``An Efficient Algorithm for Computing the HHSVM and Its Generalizations", Journal of Computational and Graphical Statistics, 22(2), 396--415. https://www.tandfonline.com/doi/full/10.1080/10618600.2012.680324

Friedman, J., Hastie, T., and Tibshirani, R. (2010), "Regularization paths for generalized linear models via coordinate descent," Journal of Statistical Software, 33(1), 1--22. https://www.jstatsoft.org/v33/i01/paper

See Also

sdwd, plot.cv.sdwd, predict.cv.sdwd, and coef.cv.sdwd methods.

Examples

Run this code
# NOT RUN {
data(colon)
colon$x = colon$x[ , 1:100] # this example only uses the first 100 columns 
n = nrow(colon$x)
set.seed(1)
id = sample(n, trunc(n/3))
cvfit = cv.sdwd(colon$x[-id, ], colon$y[-id], lambda2=1, nfolds=5)
plot(cvfit)
predict(cvfit, newx=colon$x[id, ], s="lambda.min")
# }

Run the code above in your browser using DataLab