Learn R Programming

MXM (version 0.8.7)

ridgereg.cv: Cross validation for the ridge regression

Description

Cross validation for the ridge regression is performed using the TT estimate of bias (Tibshirani and Tibshirani, 2009). There is an option for the GCV criterion which is automatic.

Usage

ridgereg.cv( target, dataset, K = 10, lambda = seq(0, 2, by = 0.1), auto = FALSE, 
seed = FALSE, ncores = 1, mat = NULL )

Arguments

target
A numeric vector containing the values of the target variable. If the values are proportions or percentages, i.e. strictly within 0 and 1 they are mapped into R using log( target/(1 - target) ).
dataset
A numeric matrix containing the variables. Rows are samples and columns are features.
K
The number of folds. Set to 10 by default.
lambda
A vector with the a grid of values of $\lambda$ to be used.
auto
A boolean variable. If it is TRUE the GCV criterion will provide an automatic answer for the best $lambda$. Otherwise k-fold cross validation is performed.
seed
A boolean variable. If it is TRUE the results will always be the same.
ncores
The number of cores to use. If it is more than 1 parallel computing is performed.
mat
If the user has its own matrix with the folds, he can put it here. It must be a matrix with K columns, each column is a fold and it contains the positions of the data, i.e. numbers, not the data. For example the first column is c(1,10,4,25,30), the seco

Value

  • A list including:
  • mspeIf auto is FALSE the values of the mean prediction error for each value of $\lambda$.
  • lambdaIf auto is FALSE the $\lambda$ which minimizes the MSPE.
  • performanceIf auto is FALSE the minimum bias corrected MSPE along with the estimate of bias.
  • runtimeThe run time of the algorithm. A numeric vector. The first element is the user time, the second element is the system time and the third element is the elapsed time.

Details

The lm.ridge command in MASS library is a wrapper for this function. If you want a fast choice of $\lambda$, then specify auto = TRUE and the $\lambda$ which minimizes the generalised cross-validation criterion will be returned. Otherise a k-fold cross validation is performed and the estimated performance is bias corrected as suggested by Tibshirani and Tibshirani (2009).

References

Hoerl A.E. and R.W. Kennard (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1):55-67. Brown P. J. (1994). Measurement, Regression and Calibration. Oxford Science Publications. Tibshirani R.J., and Tibshirani R. (2009). A bias correction for the minimum error rate in cross-validation. The Annals of Applied Statistics 3(2): 822-829.

See Also

ridge.reg

Examples

Run this code
#simulate a dataset with continuous data
dataset <- matrix(runif(200 * 50, 1, 100), nrow = 200 ) 
#the target feature is the last column of the dataset as a vector
target <- dataset[, 50]
a1 <- ridgereg.cv(target, dataset, auto = TRUE)
a2 <- ridgereg.cv( target, dataset, K = 5, lambda = seq(0, 1, by = 0.2) )

Run the code above in your browser using DataLab