CVHTF(X, y, K = 10, REP = 1, family = gaussian, ...)
Arguments
X
training inputs
y
training output
K
size of validation sample
REP
number of replications
family
glm family
…
optional arguments passed to glm or lm
Value
Vector of two components comprising the cross-validation MSE and its sd based
on the MSE in each validation sample.
Details
HTF (2009) describe K-fold cross-validation.
The observations are partitioned into K non-overlapping subsets of approximately
equal size. Each subset is used as the validation sample while the remaining
K-1 subsets are used as training data. When \(K=n\),
where n is the number of observations
the algorithm is equivalent to leave-one-out CV.
Normally \(K=10\) or \(K=5\) are used.
When \(K<n-1\), their are may be many possible partitions and so the results
of K-fold CV may vary somewhat depending on the partitions used.
In our implementation, random partitions are used and we allow for many
replications. Note that in the Shao's delete-d method, random samples are
used to select the valiation data whereas in this method the whole partition
is selected as random. This is acomplished using,
fold <- sample(rep(1:K,length=n)).
Then fold indicates each validation sample in the partition.
References
Hastie, T., Tibshirani, R. and Friedman, J. (2009).
The Elements of Statistical Learning. 2nd Ed. Springer-Verlag.