Learn R Programming

rtemis (version 0.79)

s.RANGER: Random Forest Classification and Regression [C, R]

Description

Train a Random Forest for regression or classification using ranger

Usage

s.RANGER(x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL,
  y.name = NULL, n.trees = 1000, weights = NULL, ipw = TRUE,
  ipw.type = 2, upsample = FALSE, upsample.seed = NULL,
  autotune = FALSE, classwt = NULL, n.trees.try = 500,
  stepFactor = 2, mtry = NULL, mtryStart = NULL,
  grid.resample.rtset = rtset.resample("kfold", 5),
  grid.search.type = c("exhaustive", "randomized"),
  grid.randomized.p = 0.1, metric = NULL, maximize = NULL,
  probability = FALSE, importance = "impurity", replace = TRUE,
  min.node.size = NULL, splitrule = NULL, strata = NULL,
  sampsize = if (replace) nrow(x) else ceiling(0.632 * nrow(x)),
  tune.do.trace = FALSE, imetrics = FALSE, n.cores = rtCores,
  print.tune.plot = FALSE, print.plot = TRUE, plot.fitted = NULL,
  plot.predicted = NULL, plot.theme = getOption("rt.fit.theme",
  "lightgrid"), question = NULL, grid.verbose = TRUE, verbose = TRUE,
  outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ...)

Arguments

x

Numeric vector or matrix / data frame of features i.e. independent variables

y

Numeric vector of outcome, i.e. dependent variable

x.test

Numeric vector or matrix / data frame of testing set features Columns must correspond to columns in x

y.test

Numeric vector of testing set outcome

x.name

Character: Name for feature set

y.name

Character: Name for outcome

n.trees

Integer: Number of trees to grow. Default = 1000

weights

Numeric vector: Weights for cases. For classification, weights takes precedence over ipw, therefore set weights = NULL if using ipw. Note: If weight are provided, ipw is not used. Leave NULL if setting ipw = TRUE. Default = NULL

ipw

Logical: If TRUE, apply inverse probability weighting (for Classification only). Note: If weights are provided, ipw is not used. Default = TRUE

ipw.type

Integer 0, 1, 2 1: class.weights as in 0, divided by max(class.weights) 2: class.weights as in 0, divided by min(class.weights) Default = 2

upsample

Logical: If TRUE, upsample training set cases not belonging in majority outcome group

upsample.seed

Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed)

autotune

Logical: If TRUE, use ]coderandomForest::tuneRF to determine mtry

classwt

Vector, Float: Priors of the classes for randomForest::tuneRF if autotune = TRUE. For classification only; need not add up to 1

n.trees.try

Integer: Number of trees to train for tuning, if autotune = TRUE

stepFactor

Float: If autotune = TRUE, at each tuning iteration, mtry is multiplied or divided by this value. Default = 1.5

mtry

[gS] Integer: Number of features sampled randomly at each split. Defaults to square root of n of features for classification, and a third of n of features for regression.

mtryStart

Integer: If autotune = TRUE, start at this value for mtry

grid.resample.rtset

List: Output of rtset.resample defining gridSearchLearn parameters. Default = rtset.resample("kfold", 5)

grid.search.type

String: Type of grid search to perform: "exhaustive" or "randomized". Default = "exhaustive"

grid.randomized.p

Float (0, 1): If grid.search.type = "randomized", randomly run this proportion of combinations. Default = .1

metric

String: Metric to minimize, or maximize if maximize = TRUE during grid search. Default = NULL, which results in "Balanced Accuracy" for Classification, "MSE" for Regression, and "Coherence" for Survival Analysis.

maximize

Logical: If TRUE, metric will be maximized if grid search is run. Default = FALSE

probability

Logical: If TRUE, grow a probability forest. See ranger::ranger

importance

Logical: If TRUE, estimate variable relative importance. Default = TRUE

replace

Logical: If TRUE, sample cases with replacement during training. Default = TRUE

min.node.size

[gS] Integer: Minimum node size

splitrule

String: For classification: "gini" (Default) or "extratrees"; For regression: "variance" (Default), "extratrees" or "maxstat". For survival "logrank" (Default), "extratrees", "C" or "maxstat".

strata

Vector, Factor: Will be used for stratified sampling

sampsize

Integer: Size of sample to draw. In Classification, if strata is defined, this can be a vector of the same length, in which case, corresponding values determine how many cases are drawn from the strata.

tune.do.trace

Same as do.trace but for tuning, if autotune = TRUE

imetrics

Logical: If TRUE, calculate interpretability metrics (N of trees and N of nodes) and save under the 'extra' field of rtMod

n.cores

Integer: Number of cores to use. Defaults to available cores reported by future::availableCores(), unles option rt.cores is set at the time the library is loaded

print.tune.plot

Logical: passed to randomForest::tuneRF. Default = FALSE

print.plot

Logical: if TRUE, produce plot using mplot3 Takes precedence over plot.fitted and plot.predicted

plot.fitted

Logical: if TRUE, plot True (y) vs Fitted

plot.predicted

Logical: if TRUE, plot True (y.test) vs Predicted. Requires x.test and y.test

plot.theme

String: "zero", "dark", "box", "darkbox"

question

String: the question you are attempting to answer with this model, in plain language.

grid.verbose

Logical: Passed to gridSearchLearn

verbose

Logical: If TRUE, print summary to screen.

outdir

String, Optional: Path to directory to save output

save.mod

Logical. If TRUE, save all output as RDS file in outdir save.mod is TRUE by default if an outdir is defined. If set to TRUE, and no outdir is defined, outdir defaults to paste0("./s.", mod.name)

...

Additional arguments to be passed to ranger::ranger

Value

rtMod object

Details

You should cconsider, or try, setting mtry to NCOL(x), especially for small number of features. If autotune = TRUE, randomForest::tuneRF will be run to determine best mtry value. [gS]: indicated parameter will be tuned by grid search if more than one value is passed

See Also

elevate for external cross-validation

Other Supervised Learning: s.ADABOOST, s.ADDTREE, s.BART, s.BAYESGLM, s.BRUTO, s.C50, s.CART, s.CTREE, s.DA, s.ET, s.EVTREE, s.GAM.default, s.GAM.formula, s.GAMSEL, s.GAM, s.GBM3, s.GBM, s.GLMNET, s.GLM, s.GLS, s.H2ODL, s.H2OGBM, s.H2ORF, s.IRF, s.KNN, s.LDA, s.LM, s.MARS, s.MLRF, s.MXN, s.NBAYES, s.NLA, s.NLS, s.NW, s.POLYMARS, s.PPR, s.PPTREE, s.QDA, s.QRNN, s.RFSRC, s.RF, s.SGD, s.SPLS, s.SVM, s.TFN, s.XGBLIN, s.XGB

Other Tree-based methods: s.ADABOOST, s.ADDTREE, s.BART, s.C50, s.CART, s.CTREE, s.ET, s.EVTREE, s.GBM3, s.GBM, s.H2OGBM, s.H2ORF, s.IRF, s.MLRF, s.PPTREE, s.RFSRC, s.RF, s.XGB

Other Ensembles: s.ADABOOST, s.GBM3, s.GBM, s.RF