Learn R Programming

gencve (version 0.3)

gencve-package: General Cross Validation Engine \Sexpr[results=rd,stage=build]{tools:::Rd_package_title("#1")}gencveGeneral Cross Validation Engine

Description

\Sexpr[results=rd,stage=build]{tools:::Rd_package_description("#1")}gencveEngines for cross-validation of many types of regression and class prediction models are provided. These engines include built-in support for 'glmnet', 'lars', 'plus', 'MASS', 'rpart', 'C50' and 'randomforest'. It is easy for the user to add other regression or classification algorithms. The 'parallel' package is used to improve speed. Several data generation algorithms for problems in regression and classification are provided.

Arguments

Details

The DESCRIPTION file: \Sexpr[results=rd,stage=build]{tools:::Rd_package_indices("#1")}gencve Index: This package was not yet installed at build time.

Engines for cross-validation of many types of regression and class prediction models are provided. These engines include built-in support for CRAN packages including glmnet, lars, plus, MASS, rpart, C50 and randomforest. The cross validation engines are the functions gcv() and cgcv(). It is easy for the user to add other regression or classification algorithms for use with these engines. The default cost function for regression is squared error but support is provided for mean absolute error and mean percentage absolute error. For classifcation the default cost function 0/1 loss with the associated mis-classification rate but logloss is also provided. The user may also specify their own cost function. Both gcv() and cgcv() make use of R's parallel package. Several illustrative datasets are included as well as data generation algorithms for problems in regression and classification.

The delete-d cross validation method of Shao (1993) is used. Shao recommends at least 1000 iterations so this method requires significantly more computation than k-fold cross-validation that is recommend by Hastie, Tibshirani and Friedman (2009), in conjunction with regularizatin using the one-standard-deviation rule, for the purpose of selecting a tuning parameter in penalized regression. However many researchers have noticed that even regularized k-fold cross-validation is quite variable (Kim, 2009). A future version of this package will include k-fold cross-validation and iterated k-fold cross-validation. Usually iterated k-fold cross-validation produces very similar results to the delete-d method (Kim, 2009).

Other CRAN packages that provide general frameworks with resampling strategies include boot, mlr and caret.

References

Trevor Hastie, Robert Tibshirani, Jerome H. Friedman (2009), The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Ed. Springer.

Jun Shao (1993), Linear Model Selection by Cross-validation Journal of the American Statistical Association,Vol. 88, Iss. 422, 1993.

J. H. Kim, (2009), Estimating Classification Error Rate: Repeated Cross-validation, Repeated Hold-out and Bootstrap. Computational Statistics and Data Analysis, 53, 3735-3745.

See Also

cv.glm

Examples

Run this code
#Regression with simulated model
Xy <- ShaoReg()
gcv(Xy[,1:8], Xy[,9], MaxIter=25, d=5)
#
#SVM with simulated mixture data
Xy <- rmix(100)
cgcv(X=Xy[,1:2], y=Xy[,3], yh=yh_svm, MaxIter=25)
#
#data has been divided into training and test just do simple
# cross-validation
yh_CART(SinghTrain, SinghTest)

Run the code above in your browser using DataLab