Data a simulation study reported by Shao (1993, Table 1).
The linear regression model
Shao (1993, Table 2) reported 4 simulation experiments using
4 different values for the regression coefficients:
$$y = 2 + \beta_2 x_2 + \beta_3 x_3 + \beta_4 x_4 + \beta_5 x_5 + e,$$
where \(e\) is an independent normal error with unit variance.
The four regression coefficients for the four experiments
are shown in the table below,
Experiment |
\(\beta_2\)
|
\(\beta_3\)
|
\(\beta_4\)
|
\(\beta_5\) |
1 |
0 |
0 |
4 |
0 |
2 |
0 |
0 |
4 |
8 |
3 |
9 |
0 |
4 |
8 |
The table below summarizes the probability of correct model selection
in the experiment reported by Shao (1993, Table 2).
Three model selection methods are compared: LOOCV (leave-one-out CV),
CV(d=25) or the delete-d method with d=25 and APCV which is
a very efficient computation CV method but specialized to the
case of linear regression.
Experiment |
LOOCV |
CV(d=25) |
APCV |
1 |
0.484 |
0.934 |
0.501 |
2 |
0.641 |
0.947 |
0.651 |
3 |
0.801 |
0.965 |
0.818 |
The CV(d=25) outperforms LOOCV in all cases and it also outforms APCV
by a large margin in Experiments 1, 2 and 3 but in case 4 APCV
is slightly better.