RFSRCModel: Fast Random Forest (SRC) Model

Description

Fast OpenMP computing of Breiman's random forest for a variety of data settings including right-censored survival, regression, and classification.

Usage

RFSRCModel(
  ntree = 1000,
  mtry = NULL,
  nodesize = NULL,
  nodedepth = NULL,
  splitrule = NULL,
  nsplit = 10,
  block.size = NULL,
  samptype = c("swor", "swr"),
  membership = FALSE,
  sampsize = ifelse(samptype == "swor", function(x) 0.632 * x, function(x) x),
  nimpute = 1,
  ntime = NULL,
  proximity = c(FALSE, TRUE, "inbag", "oob", "all"),
  distance = c(FALSE, TRUE, "inbag", "oob", "all"),
  forest.wt = c(FALSE, TRUE, "inbag", "oob", "all"),
  xvar.wt = NULL,
  split.wt = NULL,
  var.used = c(FALSE, "all.trees", "by.tree"),
  split.depth = c(FALSE, "all.trees", "by.tree"),
  do.trace = FALSE,
  statistics = FALSE
)
RFSRCFastModel(
  ntree = 500,
  sampsize = function(x) min(0.632 * x, max(150, x^0.75)),
  ntime = 50,
  terminal.qualts = FALSE,
  ...
)

Arguments

ntree

number of trees.

mtry

number of variables randomly selected as candidates for splitting a node.

nodesize

forest average number of unique cases in a terminal node.

nodedepth

maximum depth to which a tree should be grown.

splitrule

splitting rule (see rfsrc).

nsplit

non-negative integer value for number of random splits to consider for each candidate splitting variable.

block.size

interval number of trees at which to compute the cumulative error rate.

samptype

whether bootstrap sampling is with or without replacement.

membership

logical indicating whether to return terminal node membership.

sampsize

function specifying the bootstrap size.

nimpute

number of iterations of the missing data imputation algorithm.

ntime

integer number of time points to constrain ensemble calculations for survival outcomes.

proximity

whether and how to return proximity of cases as measured by the frequency of sharing the same terminal nodes.

distance

whether and how to return distance between cases as measured by the ratio of the sum of edges from each case to the root node.

forest.wt

whether and how to return the forest weight matrix.

xvar.wt

vector of non-negative weights representing the probability of selecting a variable for splitting.

split.wt

vector of non-negative weights used for multiplying the split statistic for a variable.

var.used

whether and how to return variables used for splitting.

split.depth

whether and how to return minimal depth for each variable.

do.trace

number of seconds between updates to the user on approximate time to completion.

statistics

logical indicating whether to return split statistics.

terminal.qualts

logical indicating whether to return terminal node membership information.

...

arguments passed to RFSRCModel.

Value

MLModel class object.

Details

Response Types:: factor, matrix, numeric, Surv
Automatic Tuning of Grid Parameters:: mtry, nodesize

Default values for the NULL arguments and further model details can be found in the source link below.

In calls to varimp for RFSRCModel, argument metric may be specified as "permute" (default) from permuting OOB cases, as "random" for permutation replaced with random assignment, or as "anit" for cases assigned to the split opposite of the random assignments. Variable importance is automatically scaled to range from 0 to 100. To obtain unscaled importance values, set scale = FALSE. See example below.

Examples

Run this code

# NOT RUN {
## Requires prior installation of suggested package randomForestSRC to run

model_fit <- fit(sale_amount ~ ., data = ICHomes, model = RFSRCModel)
varimp(model_fit, metric = "random", scale = TRUE)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab