Train a Random Forest for regression or classification using randomForest
s.RF(x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL,
y.name = NULL, n.trees = 1000, autotune = FALSE,
n.trees.try = 1000, stepFactor = 1.5, mtry = NULL,
mtryStart = mtry, grid.resample.rtset = rtset.resample("kfold", 5),
metric = NULL, maximize = NULL, classwt = NULL, ipw = TRUE,
ipw.type = 2, upsample = FALSE, upsample.seed = NULL,
importance = TRUE, proximity = FALSE, replace = TRUE,
nodesize = NULL, maxnodes = NULL, strata = NULL, sampsize = if
(replace) nrow(x) else ceiling(0.632 * nrow(x)), sampsize.ratio = NULL,
do.trace = NULL, tune.do.trace = FALSE, imetrics = FALSE,
n.cores = rtCores, print.tune.plot = FALSE, print.plot = TRUE,
plot.fitted = NULL, plot.predicted = NULL,
plot.theme = getOption("rt.fit.theme", "lightgrid"),
proximity.tsne = FALSE, discard.forest = FALSE,
tsne.perplexity = 5, plot.tsne.train = FALSE,
plot.tsne.test = FALSE, question = NULL, rtclass = NULL,
verbose = TRUE, grid.verbose = TRUE, outdir = NULL,
save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ...)
Numeric vector or matrix / data frame of features i.e. independent variables
Numeric vector of outcome, i.e. dependent variable
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in x
Numeric vector of testing set outcome
Character: Name for feature set
Character: Name for outcome
Integer: Number of trees to grow. Default = 1000
Logical: If TRUE, use ]coderandomForest::tuneRF to determine mtry
Integer: Number of trees to train for tuning, if autotune = TRUE
Float: If autotune = TRUE
, at each tuning iteration, mtry
is multiplied or
divided by this value. Default = 1.5
[gS] Integer: Number of features sampled randomly at each split
Integer: If autotune = TRUE
, start at this value for mtry
List: Output of rtset.resample defining gridSearchLearn parameters.
Default = rtset.resample("kfold", 5)
String: Metric to minimize, or maximize if maximize = TRUE
during grid search.
Default = NULL, which results in "Balanced Accuracy" for Classification,
"MSE" for Regression, and "Coherence" for Survival Analysis.
Logical: If TRUE, metric
will be maximized if grid search is run. Default = FALSE
Vector, Float: Priors of the classes for classification only. Need not add up to 1
Logical: If TRUE, apply inverse probability weighting (for Classification only).
Note: If weights
are provided, ipw
is not used. Default = TRUE
Integer 0, 1, 2 1: class.weights as in 0, divided by max(class.weights) 2: class.weights as in 0, divided by min(class.weights) Default = 2
Logical: If TRUE, upsample training set cases not belonging in majority outcome group
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed)
Logical: If TRUE, estimate variable relative importance. Default = TRUE
Logical: If TRUE, calculate proximity measure among cases. Default = FALSE
Logical: If TRUE, sample cases with replacement during training. Default = TRUE
[gS]: Integer: Minimum size of terminal nodes. Default = 5 (Regression); 1 (Classification)
[gS]: Integer: Maximum number of terminal nodes in a tree. Default = NULL; trees grown to maximum possible
Vector, Factor: Will be used for stratified sampling
Integer: Size of sample to draw. In Classification, if strata
is defined, this
can be a vector of the same length, in which case, corresponding values determine how many cases are drawn from
the strata.
Float (0, 1): Heuristic of sorts to increase sensitivity in unbalanced
cases. Sample with replacement from minority case to create bootstraps of length N cases.
Select (sampsize.ratio * N minority cases)
cases from majority class.
Logical or integer: If TRUE, randomForest
will outpout information while it is running.
If an integer, randomForest
will report progress every this many trees. Default = n.trees/10
if
verbose = TRUE
Same as do.trace
but for tuning, if autotune = TRUE
Logical: If TRUE, calculate interpretability metrics (N of trees and N of nodes) and save under the 'extra' field of rtMod
Integer: Number of cores to use. Defaults to available cores reported by
future::availableCores()
, unles option rt.cores
is set at the time the library is loaded
Logical: passed to randomForest::tuneRF
. Default = FALSE
Logical: if TRUE, produce plot using mplot3
Takes precedence over plot.fitted
and plot.predicted
Logical: if TRUE, plot True (y) vs Fitted
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires x.test
and y.test
String: "zero", "dark", "box", "darkbox"
Logical: If TRUE, perform t-SNE on proximity matrix. Will be saved under 'extra' field of rtMod. Default = FALSE
Logical: If TRUE, remove forest from rtMod object to save space. Default = FALSE
Numeric: Perplexity parameter for Rtsne::Rtsne
Logical: If TRUE, plot training set tSNE projections
Logical: If TRUE, plot testing set tSNE projections
String: the question you are attempting to answer with this model, in plain language.
String: Class type to use. "S3", "S4", "RC", "R6"
Logical: If TRUE, print summary to screen.
Logical: Passed to gridSearchLearn
String, Optional: Path to directory to save output
Logical. If TRUE, save all output as RDS file in outdir
save.mod
is TRUE by default if an outdir
is defined. If set to TRUE, and no outdir
is defined, outdir defaults to paste0("./s.", mod.name)
Additional arguments to be passed to randomForest::randomForest
rtMod object
If autotue = TRUE
, randomForest::tuneRF
will be run to determine best mtry
value.
elevate for external cross-validation
Other Supervised Learning: s.ADABOOST
,
s.ADDTREE
, s.BART
,
s.BAYESGLM
, s.BRUTO
,
s.C50
, s.CART
,
s.CTREE
, s.DA
,
s.ET
, s.EVTREE
,
s.GAM.default
, s.GAM.formula
,
s.GAMSEL
, s.GAM
,
s.GBM3
, s.GBM
,
s.GLMNET
, s.GLM
,
s.GLS
, s.H2ODL
,
s.H2OGBM
, s.H2ORF
,
s.IRF
, s.KNN
,
s.LDA
, s.LM
,
s.MARS
, s.MLRF
,
s.MXN
, s.NBAYES
,
s.NLA
, s.NLS
,
s.NW
, s.POLYMARS
,
s.PPR
, s.PPTREE
,
s.QDA
, s.QRNN
,
s.RANGER
, s.RFSRC
,
s.SGD
, s.SPLS
,
s.SVM
, s.TFN
,
s.XGBLIN
, s.XGB
Other Tree-based methods: s.ADABOOST
,
s.ADDTREE
, s.BART
,
s.C50
, s.CART
,
s.CTREE
, s.ET
,
s.EVTREE
, s.GBM3
,
s.GBM
, s.H2OGBM
,
s.H2ORF
, s.IRF
,
s.MLRF
, s.PPTREE
,
s.RANGER
, s.RFSRC
,
s.XGB
Other Ensembles: s.ADABOOST
,
s.GBM3
, s.GBM
,
s.RANGER