Learn R Programming

MachineShop (version 3.8.0)

t.test: Paired t-Tests for Model Comparisons


Paired t-test comparisons of resampled performance metrics from different models.


# S3 method for PerformanceDiff
t.test(x, adjust = "holm", ...)


PerformanceDiffTest class object that inherits from array. p-values and mean differences are contained in the lower and upper triangular portions, respectively, of the first two dimensions. Model pairs are contained in the third dimension.



performance difference result.


method of p-value adjustment for multiple statistical comparisons as implemented by p.adjust.


arguments passed to other methods.


The t-test statistic for pairwise model differences of \(R\) resampled performance metric values is calculated as $$ t = \frac{\bar{x}_R}{\sqrt{F s^2_R / R}}, $$ where \(\bar{x}_R\) and \(s^2_R\) are the sample mean and variance. Statistical testing for a mean difference is then performed by comparing \(t\) to a \(t_{R-1}\) null distribution. The sample variance in the t statistic is known to underestimate the true variances of cross-validation mean estimators. Underestimation of these variances will lead to increased probabilities of false-positive statistical conclusions. Thus, an additional factor \(F\) is included in the t statistic to allow for variance corrections. A correction of \(F = 1 + K / (K - 1)\) was found by Nadeau and Bengio (2003) to be a good choice for cross-validation with \(K\) folds and is thus used for that resampling method. The extension of this correction by Bouchaert and Frank (2004) to \(F = 1 + T K / (K - 1)\) is used for cross-validation with \(K\) folds repeated \(T\) times. For other resampling methods \(F = 1\).


Nadeau, C., & Bengio, Y. (2003). Inference for the generalization error. Machine Learning, 52, 239–81.

Bouckaert, R. R., & Frank, E. (2004). Evaluating the replicability of significance tests for comparing learning algorithms. In H. Dai, R. Srikant, & C. Zhang (Eds.), Advances in knowledge discovery and data mining (pp. 3–12). Springer.


Run this code
# \donttest{
## Requires prior installation of suggested package gbm to run

## Numeric response example
fo <- sale_amount ~ .
control <- CVControl()

gbm_res1 <- resample(fo, ICHomes, GBMModel(n.trees = 25), control)
gbm_res2 <- resample(fo, ICHomes, GBMModel(n.trees = 50), control)
gbm_res3 <- resample(fo, ICHomes, GBMModel(n.trees = 100), control)

res <- c(GBM1 = gbm_res1, GBM2 = gbm_res2, GBM3 = gbm_res3)
res_diff <- diff(res)
# }

Run the code above in your browser using DataLab