RFSRCModel: Fast Random Forest (SRC) Model

Description

Fast OpenMP computing of Breiman's random forest for a variety of data settings including right-censored survival, regression, and classification.

Usage

RFSRCModel(
  ntree = 1000,
  mtry = integer(),
  nodesize = integer(),
  nodedepth = integer(),
  splitrule = character(),
  nsplit = 10,
  block.size = integer(),
  samptype = c("swor", "swr"),
  membership = FALSE,
  sampsize = if (samptype == "swor") function(x) 0.632 * x else function(x) x,
  nimpute = 1,
  ntime = integer(),
  proximity = c(FALSE, TRUE, "inbag", "oob", "all"),
  distance = c(FALSE, TRUE, "inbag", "oob", "all"),
  forest.wt = c(FALSE, TRUE, "inbag", "oob", "all"),
  xvar.wt = numeric(),
  split.wt = numeric(),
  var.used = c(FALSE, "all.trees", "by.tree"),
  split.depth = c(FALSE, "all.trees", "by.tree"),
  do.trace = FALSE,
  statistics = FALSE
)
RFSRCFastModel(
  ntree = 500,
  sampsize = function(x) min(0.632 * x, max(x^0.75, 150)),
  ntime = 50,
  terminal.qualts = FALSE,
  ...
)

Value

MLModel class object.

Arguments

ntree: number of trees.
mtry: number of variables randomly selected as candidates for splitting a node.
nodesize: minumum size of terminal nodes.
nodedepth: maximum depth to which a tree should be grown.
splitrule: splitting rule (see rfsrc).
nsplit: non-negative integer value for number of random splits to consider for each candidate splitting variable.
block.size: interval number of trees at which to compute the cumulative error rate.
samptype: whether bootstrap sampling is with or without replacement.
membership: logical indicating whether to return terminal node membership.
sampsize: function specifying the bootstrap size.
nimpute: number of iterations of the missing data imputation algorithm.
ntime: integer number of time points to constrain ensemble calculations for survival outcomes.
proximity: whether and how to return proximity of cases as measured by the frequency of sharing the same terminal nodes.
distance: whether and how to return distance between cases as measured by the ratio of the sum of edges from each case to the root node.
forest.wt: whether and how to return the forest weight matrix.
xvar.wt: vector of non-negative weights representing the probability of selecting a variable for splitting.
split.wt: vector of non-negative weights used for multiplying the split statistic for a variable.
var.used: whether and how to return variables used for splitting.
split.depth: whether and how to return minimal depth for each variable.
do.trace: number of seconds between updates to the user on approximate time to completion.
statistics: logical indicating whether to return split statistics.
terminal.qualts: logical indicating whether to return terminal node membership information.
...: arguments passed to RFSRCModel.

Details

Response types:: factor, matrix, numeric, Surv
Automatic tuning of grid parameters:: mtry, nodesize

Default argument values and further model details can be found in the source See Also links below.

In calls to varimp for RFSRCModel, argument type may be specified as "anti" (default) for cases assigned to the split opposite of the random assignments, as "permute" for permutation of OOB cases, or as "random" for permutation replaced with random assignment. Variable importance is automatically scaled to range from 0 to 100. To obtain unscaled importance values, set scale = FALSE. See example below.

Examples

Run this code

# \donttest{
## Requires prior installation of suggested package randomForestSRC to run

model_fit <- fit(sale_amount ~ ., data = ICHomes, model = RFSRCModel)
varimp(model_fit, method = "model", type = "random", scale = TRUE)
# }

Run the code above in your browser using DataLab