Learn R Programming

bestglm (version 0.37.3)

CVHTF: K-fold Cross-Validation

Description

K-fold cross-validation.

Usage

CVHTF(X, y, K = 10, REP = 1, family = gaussian, ...)

Arguments

X

training inputs

y

training output

K

size of validation sample

REP

number of replications

family

glm family

optional arguments passed to glm or lm

Value

Vector of two components comprising the cross-validation MSE and its sd based on the MSE in each validation sample.

Details

HTF (2009) describe K-fold cross-validation. The observations are partitioned into K non-overlapping subsets of approximately equal size. Each subset is used as the validation sample while the remaining K-1 subsets are used as training data. When \(K=n\), where n is the number of observations the algorithm is equivalent to leave-one-out CV. Normally \(K=10\) or \(K=5\) are used. When \(K<n-1\), their are may be many possible partitions and so the results of K-fold CV may vary somewhat depending on the partitions used. In our implementation, random partitions are used and we allow for many replications. Note that in the Shao's delete-d method, random samples are used to select the valiation data whereas in this method the whole partition is selected as random. This is acomplished using, fold <- sample(rep(1:K,length=n)). Then fold indicates each validation sample in the partition.

References

Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning. 2nd Ed. Springer-Verlag.

See Also

bestglm, CVd, CVDH, LOOCV

Examples

Run this code
# NOT RUN {
#Example 1. 10-fold CV
data(zprostate)
train<-(zprostate[zprostate[,10],])[,-10]
X<-train[,1:2]
y<-train[,9]
CVHTF(X,y,K=10,REP=1)[1]
# }

Run the code above in your browser using DataLab