VSURF
package with some modifications. These modifications allow for unbiased
computation of variable importance via the cforest
function in the party package.rfThresh(formula, data, nruns = 50, silent = FALSE,
importance = "permutation", nmin = 1, ...)
y~x1 + x2
, where y
is the response variable and anything following ~
are predictors.cforest
or randomForest
nruns
iterations. nfor.thres
random forests are computed using the function randomForest
with
arguments importance=TRUE
. Then variables are sorted according to
their mean variable importance (VI), in decreasing order. This order is
kept all along the procedure. Next, a threshold is computed:
min.thres
, the minimum predicted value of a pruned CART tree fitted
to the curve of the standard deviations of VI. Finally, the actual
"thresholding step" is performed: only variables with a mean VI larger than
nmin
* min.thres
are kept. nfor.interp
embedded random forests models
are grown, starting with the random forest build with only the most
important variable and ending with all variables selected in the first step.
Then, err.min
the minimum mean out-of-bag (OOB) error of these models
and its associated standard deviation sd.min
are computed. Finally,
the smallest model (and hence its corresponding variables) having a mean OOB
error less than err.min
+ nsd
* sd.min
is selected. mean.jump
, the mean jump value is calculated using
variables that have been left out by the second step, and is set as the mean
absolute difference between mean OOB errors of one model and its first
following model. Hence a variable is included in the model if the mean OOB
error decrease is larger than nmj
* mean.jump
. rfInterp
, rfPred