mlr learner for regression tasks using randomForest::randomForest.
This doc page exists, as we added additional uncertainty estimation functionality
(predict.type = "se"
) for the randomForest, which is not provided by the underlying package.
Currently implemented methods are:
If se.method = "jackknife"
the standard error of a prediction
is estimated by computing the jackknife-after-bootstrap, the mean-squared difference between
the prediction made by only using trees which did not contain said observation and
the ensemble prediction.
If se.method = "bootstrap"
the standard error of a prediction is estimated by
bootstrapping the random forest, where the number of bootstrap replicates and the number of
trees in the ensemble are controlled by se.boot
and se.ntree
respectively,
and then taking the standard deviation of the bootstrap predictions. The "brute force" bootstrap
is executed when ntree = se.ntree
, the latter of which controls the number of trees in the
individual random forests which are bootstrapped. The "noisy bootstrap" is executed when se.ntree < ntree
which is less computationally expensive. A Monte-Carlo bias correction may make the latter option
prefarable in many cases. Defaults are se.boot = 50
and se.ntree = 100
.
If se.method = "sd"
, the default, the standard deviation of the predictions across trees is
returned as the variance estimate.
This can be computed quickly but is also a very naive estimator.
For both “jackknife” and “bootstrap”, a Monte-Carlo bias correction is applied and, in the case that this results in a negative variance estimate, the values are truncated at 0.
Note that when using the “jackknife” procedure for se estimation, using a small number of trees can lead to training data observations that are never out-of-bag. The current implementation ignores these observations, but in the original definition, the resulting se estimation would be undefined.
Please note that all of the mentioned se.method
variants do not affect the computation
of the posterior mean “response” value. This is always the same as from the underlying
randomForest.
Joseph Sexton and Petter Laake; Standard errors for bagged and random forest estimators, Computational Statistics and Data Analysis Volume 53, 2009, 801-811. Also see: Stefan Wager, Trevor Hastie, and Bradley Efron; Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife, Journal of Machine Learning Research Volume 15, 2014, 1625-1651.