Fast OpenMP computing of Breiman's random forest for a variety of data settings including right-censored survival, regression, and classification.
RFSRCModel(
ntree = 1000,
mtry = NULL,
nodesize = NULL,
nodedepth = NULL,
splitrule = NULL,
nsplit = 10,
block.size = NULL,
samptype = c("swor", "swr"),
membership = FALSE,
sampsize = ifelse(samptype == "swor", function(x) 0.632 * x, function(x) x),
nimpute = 1,
ntime = NULL,
proximity = c(FALSE, TRUE, "inbag", "oob", "all"),
distance = c(FALSE, TRUE, "inbag", "oob", "all"),
forest.wt = c(FALSE, TRUE, "inbag", "oob", "all"),
xvar.wt = NULL,
split.wt = NULL,
var.used = c(FALSE, "all.trees", "by.tree"),
split.depth = c(FALSE, "all.trees", "by.tree"),
do.trace = FALSE,
statistics = FALSE
)RFSRCFastModel(
ntree = 500,
sampsize = function(x) min(0.632 * x, max(150, x^0.75)),
ntime = 50,
terminal.qualts = FALSE,
...
)
number of trees.
number of variables randomly selected as candidates for splitting a node.
forest average number of unique cases in a terminal node.
maximum depth to which a tree should be grown.
splitting rule (see rfsrc
).
non-negative integer value for number of random splits to consider for each candidate splitting variable.
interval number of trees at which to compute the cumulative error rate.
whether bootstrap sampling is with or without replacement.
logical indicating whether to return terminal node membership.
function specifying the bootstrap size.
number of iterations of the missing data imputation algorithm.
integer number of time points to constrain ensemble calculations for survival outcomes.
whether and how to return proximity of cases as measured by the frequency of sharing the same terminal nodes.
whether and how to return distance between cases as measured by the ratio of the sum of edges from each case to the root node.
whether and how to return the forest weight matrix.
vector of non-negative weights representing the probability of selecting a variable for splitting.
vector of non-negative weights used for multiplying the split statistic for a variable.
whether and how to return variables used for splitting.
whether and how to return minimal depth for each variable.
number of seconds between updates to the user on approximate time to completion.
logical indicating whether to return split statistics.
logical indicating whether to return terminal node membership information.
arguments passed to RFSRCModel
.
MLModel
class object.
factor
, matrix
, numeric
,
Surv
mtry
, nodesize
Default values for the NULL
arguments and further model details can be
found in the source link below.
In calls to varimp
for RFSRCModel
, argument
metric
may be specified as "permute"
(default) from permuting
OOB cases, as "random"
for permutation replaced with random
assignment, or as "anit"
for cases assigned to the split opposite of
the random assignments. Variable importance is automatically scaled to range
from 0 to 100. To obtain unscaled importance values, set
scale = FALSE
. See example below.
# NOT RUN {
## Requires prior installation of suggested package randomForestSRC to run
model_fit <- fit(sale_amount ~ ., data = ICHomes, model = RFSRCModel)
varimp(model_fit, metric = "random", scale = TRUE)
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab