Learn R Programming

bootstrap (version 2019.6)

crossval: K-fold Cross-Validation

Description

See Efron and Tibshirani (1993) for details on this function.

Usage

crossval(x, y, theta.fit, theta.predict, ..., ngroup=n)

Arguments

x

a matrix containing the predictor (regressor) values. Each row corresponds to an observation.

y

a vector containing the response values

theta.fit

function to be cross-validated. Takes x and y as an argument. See example below.

theta.predict

function producing predicted values for theta.fit. Arguments are a matrix \(x\) of predictors and fit object produced by theta.fit. See example below.

...

any additional arguments to be passed to theta.fit

ngroup

optional argument specifying the number of groups formed . Default is ngroup=sample size, corresponding to leave-one out cross-validation.

Value

list with the following components

cv.fit

The cross-validated fit for each observation. The numbers 1 to n (the sample size) are partitioned into ngroup mutually disjoint groups of size "leave.out". leave.out, the number of observations in each group, is the integer part of n/ngroup. The groups are chosen at random if ngroup < n. (If n/leave.out is not an integer, the last group will contain > leave.out observations). Then theta.fit is applied with the kth group of observations deleted, for k=1, 2, ngroup. Finally, the fitted value is computed for the kth group using theta.predict.

ngroup

The number of groups

leave.out

The number of observations in each group

groups

A list of length ngroup containing the indices of the observations in each group. Only returned if leave.out > 1.

call

The deparsed call

References

Stone, M. (1974). Cross-validation choice and assessment of statistical predictions. Journal of the Royal Statistical Society, B-36, 111--147.

Efron, B. and Tibshirani, R. (1993) An Introduction to the Bootstrap. Chapman and Hall, New York, London.

Examples

Run this code
# NOT RUN {
# cross-validation of least squares regression
# note that crossval is not very efficient, and being a
#  general purpose function, it does not use the
# Sherman-Morrison identity for this special case
   x <- rnorm(85)  
   y <- 2*x +.5*rnorm(85)                      
   theta.fit <- function(x,y){lsfit(x,y)}
   theta.predict <- function(fit,x){
               cbind(1,x)%*%fit$coef         
               }                       
   results <- crossval(x,y,theta.fit,theta.predict,ngroup=6)  
                                      
# }

Run the code above in your browser using DataLab