Learn R Programming

rtemis (version 0.79)

s.RF: Random Forest Classification and Regression [C, R]

Description

Train a Random Forest for regression or classification using randomForest

Usage

s.RF(x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL,
  y.name = NULL, n.trees = 1000, autotune = FALSE,
  n.trees.try = 1000, stepFactor = 1.5, mtry = NULL,
  mtryStart = mtry, grid.resample.rtset = rtset.resample("kfold", 5),
  metric = NULL, maximize = NULL, classwt = NULL, ipw = TRUE,
  ipw.type = 2, upsample = FALSE, upsample.seed = NULL,
  importance = TRUE, proximity = FALSE, replace = TRUE,
  nodesize = NULL, maxnodes = NULL, strata = NULL, sampsize = if
  (replace) nrow(x) else ceiling(0.632 * nrow(x)), sampsize.ratio = NULL,
  do.trace = NULL, tune.do.trace = FALSE, imetrics = FALSE,
  n.cores = rtCores, print.tune.plot = FALSE, print.plot = TRUE,
  plot.fitted = NULL, plot.predicted = NULL,
  plot.theme = getOption("rt.fit.theme", "lightgrid"),
  proximity.tsne = FALSE, discard.forest = FALSE,
  tsne.perplexity = 5, plot.tsne.train = FALSE,
  plot.tsne.test = FALSE, question = NULL, rtclass = NULL,
  verbose = TRUE, grid.verbose = TRUE, outdir = NULL,
  save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ...)

Arguments

x

Numeric vector or matrix / data frame of features i.e. independent variables

y

Numeric vector of outcome, i.e. dependent variable

x.test

Numeric vector or matrix / data frame of testing set features Columns must correspond to columns in x

y.test

Numeric vector of testing set outcome

x.name

Character: Name for feature set

y.name

Character: Name for outcome

n.trees

Integer: Number of trees to grow. Default = 1000

autotune

Logical: If TRUE, use ]coderandomForest::tuneRF to determine mtry

n.trees.try

Integer: Number of trees to train for tuning, if autotune = TRUE

stepFactor

Float: If autotune = TRUE, at each tuning iteration, mtry is multiplied or divided by this value. Default = 1.5

mtry

[gS] Integer: Number of features sampled randomly at each split

mtryStart

Integer: If autotune = TRUE, start at this value for mtry

grid.resample.rtset

List: Output of rtset.resample defining gridSearchLearn parameters. Default = rtset.resample("kfold", 5)

metric

String: Metric to minimize, or maximize if maximize = TRUE during grid search. Default = NULL, which results in "Balanced Accuracy" for Classification, "MSE" for Regression, and "Coherence" for Survival Analysis.

maximize

Logical: If TRUE, metric will be maximized if grid search is run. Default = FALSE

classwt

Vector, Float: Priors of the classes for classification only. Need not add up to 1

ipw

Logical: If TRUE, apply inverse probability weighting (for Classification only). Note: If weights are provided, ipw is not used. Default = TRUE

ipw.type

Integer 0, 1, 2 1: class.weights as in 0, divided by max(class.weights) 2: class.weights as in 0, divided by min(class.weights) Default = 2

upsample

Logical: If TRUE, upsample training set cases not belonging in majority outcome group

upsample.seed

Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed)

importance

Logical: If TRUE, estimate variable relative importance. Default = TRUE

proximity

Logical: If TRUE, calculate proximity measure among cases. Default = FALSE

replace

Logical: If TRUE, sample cases with replacement during training. Default = TRUE

nodesize

[gS]: Integer: Minimum size of terminal nodes. Default = 5 (Regression); 1 (Classification)

maxnodes

[gS]: Integer: Maximum number of terminal nodes in a tree. Default = NULL; trees grown to maximum possible

strata

Vector, Factor: Will be used for stratified sampling

sampsize

Integer: Size of sample to draw. In Classification, if strata is defined, this can be a vector of the same length, in which case, corresponding values determine how many cases are drawn from the strata.

sampsize.ratio

Float (0, 1): Heuristic of sorts to increase sensitivity in unbalanced cases. Sample with replacement from minority case to create bootstraps of length N cases. Select (sampsize.ratio * N minority cases) cases from majority class.

do.trace

Logical or integer: If TRUE, randomForest will outpout information while it is running. If an integer, randomForest will report progress every this many trees. Default = n.trees/10 if verbose = TRUE

tune.do.trace

Same as do.trace but for tuning, if autotune = TRUE

imetrics

Logical: If TRUE, calculate interpretability metrics (N of trees and N of nodes) and save under the 'extra' field of rtMod

n.cores

Integer: Number of cores to use. Defaults to available cores reported by future::availableCores(), unles option rt.cores is set at the time the library is loaded

print.tune.plot

Logical: passed to randomForest::tuneRF. Default = FALSE

print.plot

Logical: if TRUE, produce plot using mplot3 Takes precedence over plot.fitted and plot.predicted

plot.fitted

Logical: if TRUE, plot True (y) vs Fitted

plot.predicted

Logical: if TRUE, plot True (y.test) vs Predicted. Requires x.test and y.test

plot.theme

String: "zero", "dark", "box", "darkbox"

proximity.tsne

Logical: If TRUE, perform t-SNE on proximity matrix. Will be saved under 'extra' field of rtMod. Default = FALSE

discard.forest

Logical: If TRUE, remove forest from rtMod object to save space. Default = FALSE

tsne.perplexity

Numeric: Perplexity parameter for Rtsne::Rtsne

plot.tsne.train

Logical: If TRUE, plot training set tSNE projections

plot.tsne.test

Logical: If TRUE, plot testing set tSNE projections

question

String: the question you are attempting to answer with this model, in plain language.

rtclass

String: Class type to use. "S3", "S4", "RC", "R6"

verbose

Logical: If TRUE, print summary to screen.

grid.verbose

Logical: Passed to gridSearchLearn

outdir

String, Optional: Path to directory to save output

save.mod

Logical. If TRUE, save all output as RDS file in outdir save.mod is TRUE by default if an outdir is defined. If set to TRUE, and no outdir is defined, outdir defaults to paste0("./s.", mod.name)

...

Additional arguments to be passed to randomForest::randomForest

Value

rtMod object

Details

If autotue = TRUE, randomForest::tuneRF will be run to determine best mtry value.

See Also

elevate for external cross-validation

Other Supervised Learning: s.ADABOOST, s.ADDTREE, s.BART, s.BAYESGLM, s.BRUTO, s.C50, s.CART, s.CTREE, s.DA, s.ET, s.EVTREE, s.GAM.default, s.GAM.formula, s.GAMSEL, s.GAM, s.GBM3, s.GBM, s.GLMNET, s.GLM, s.GLS, s.H2ODL, s.H2OGBM, s.H2ORF, s.IRF, s.KNN, s.LDA, s.LM, s.MARS, s.MLRF, s.MXN, s.NBAYES, s.NLA, s.NLS, s.NW, s.POLYMARS, s.PPR, s.PPTREE, s.QDA, s.QRNN, s.RANGER, s.RFSRC, s.SGD, s.SPLS, s.SVM, s.TFN, s.XGBLIN, s.XGB

Other Tree-based methods: s.ADABOOST, s.ADDTREE, s.BART, s.C50, s.CART, s.CTREE, s.ET, s.EVTREE, s.GBM3, s.GBM, s.H2OGBM, s.H2ORF, s.IRF, s.MLRF, s.PPTREE, s.RANGER, s.RFSRC, s.XGB

Other Ensembles: s.ADABOOST, s.GBM3, s.GBM, s.RANGER