Learn R Programming

gencve (version 0.3)

gcv: Estimate EPE Using Delete-d Cross-Validation

Description

This is a general purpose function to estimate the EPE of a specified cost function in regression and classification problems. For regression, the default cost function is for mean-square error and for classification it is the misclassification rate. Direct support for elastic penalty regression, LASSO, PCR, PLSR, nearest neighbour and Random Forest regression are included in the package. And for classification, built-in support functions are provided for LDA, QDA, Naive Bayes, kNN, CART, C5.0, Random Forest and SVM. Examples included in vignette section are provided for SCAD, MCP and best subset regression. Illustrative example datasets and data generation models are also provided.

Usage

gcv(X, y, MaxIter = 1000, d = ceiling(length(y)/10), NCores = 1, cost = mse, yhat = yhat_lm, libs = character(0), seed = "default", ...)

Arguments

X
inputs, matrix or dataframe
y
output vector
MaxIter
Number of iterations of the CV procedure
d
Number of observations for the hold-out sample
NCores
Default is 1 which does not use the parallel package. Otherwise, you can set to the number of cores available. If unsure, just experiment!
cost
Average cost. See examples mse, mae, mape.
yhat
In general it must be a function with arguments dfTrain and dfTest. See examples below.
libs
Required libraries needed for the predictor.
seed
Default is to use R's default which is based on the current time. Otherwise set to an integer value. See Details.
...
Additional arguments that are passed to yhat.

Value

are respectively the estimated EPE, standard deviation of this estimate, an estimate of the snr (signal-to-noise ratio) out-of-sample and an out-of-sample estimate of the correlation between the prediction and the true value.

Details

If only serial evaluation was implemented then I would have used set.seed to control the random. But I have included it as an argument since it can be used to set the parallel random number generator seed. This is sometimes useful for replicating the simulations. If the argument seed is used, it will also set the seed when only serial computation is done.

References

ESL

See Also

mse, mae, mape, misclassificationrate, logloss, yhat_lm, yhat_nn, yhat_lars, yhat_plus, yhat_gel, yhat_step, yh_lda, yh_qda, yh_svm, yh_NB, yh_RF, yh_CART, yh_C50, yh_kNN, featureSelect, cv.glm

Examples

Run this code
#Simple example but in general, MaxIter >= 1000 is recommended.
Xy <- ShaoReg()
gcv(Xy[,1:8], Xy[,9], MaxIter=25, d=5)

Run the code above in your browser using DataLab