Learn R Programming

reams (version 0.1)

cvic: Cross-validation information criterion

Description

A model selection criterion proposed by Reiss et al. (2012), which employs cross-validation to estimate the overoptimism associated with the best candidate model of each size.

Usage

cvic(y, X, nfold = length(y), pvec = 1:(ncol(X) + 1))

Arguments

y
outcome vector
X
model matrix. This should not include an intercept column; such a column is added by the function.
nfold
number of "folds" (validation sets). The sample size must be divisible by this number.
pvec
vector of possible dimensions of the model to consider: by default, ranges from 1 (intercept only) to ncol(X) + 1 (full model).

Value

A list with components
nlogsig2hat
value of the first (non-penalty) term of the criterion, i.e., sample size times log of MLE of the variance, for best model of each dimension in pvec.
cv.pen
cross-validation penalty, as described by Reiss et al. (2011).
edf, edf.mon
effective degrees of freedom, before and after constrained monotone smoothing.
cvic
CVIC based on the raw edf.
cvic.mon
CVIC based on edf to which constrained monotone smoothing has been applied.
best, best.mon
vectors of logicals indicating which columns of the model matrix are included in the CVIC-minimizing model, without and with constrained monotone smoothing.

Details

CVIC is similar to corrected AIC (Sugiura, 1978; Hurvich and Tsai, 1989), but instead of the nominal model dimension, it substitutes a measure of effective degrees of freedom (edf) that takes best-subset selection into account. The "raw" edf is obtained by cross-validation. Alternatively, one can refine the edf via constrained monotone smoothing, as described by Reiss et al. (2011).

References

Hurvich, C. M., and Tsai, C.-L. (1989). Regression and time series model selection in small samples. Biometrika, 76, 297--307.

Reiss, P. T., Huang, L., Cavanaugh, J. E., and Roy, A. K. (2012). Resampling-based information criteria for adaptive linear model selection. Annals of the Institute of Statistical Mathematics, to appear. Available at http://works.bepress.com/phil_reiss/17

Sugiura, N. (1978). Further analysis of the data by Akaike's information criterion and the finite corrections. Communications in Statistics: Theory & Methods, 7, 13--26.

See Also

leaps in package leaps for best-subset selection; pcls in package mgcv for the constrained monotone smoothing.

Examples

Run this code
# Predicting fertility from provincial socioeconomic indicators
data(swiss)
cvicobj <- cvic(swiss$Fertility, swiss[ , -1])
cvicobj$best
cvicobj$best.mon

Run the code above in your browser using DataLab