Train a bagged ensemble using any learner
bag(x, y = NULL, x.test = NULL, y.test = NULL, weights = NULL,
mod = "cart", k = 10, mod.params = list(), ipw = TRUE,
ipw.type = 2, upsample = FALSE, upsample.seed = NULL,
.resample = rtset.resample(resampler = "strat.boot", n.resamples = k),
aggr.fn = mean, x.name = NULL, y.name = NULL, question = NULL,
base.verbose = FALSE, verbose = TRUE, trace = 0,
print.plot = TRUE, plot.fitted = NULL, plot.predicted = NULL,
plot.theme = getOption("rt.fit.theme", "lightgrid"),
print.base.plot = FALSE, n.cores = rtCores,
parallel.type = ifelse(.Platform$OS.type == "unix", "fork", "psock"),
outdir = NULL, ...)
Numeric vector or matrix / data frame of features i.e. independent variables
Numeric vector of outcome, i.e. dependent variable
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in x
Numeric vector of testing set outcome
Numeric vector: Weights for cases. For classification, weights
takes precedence
over ipw
, therefore set weights = NULL
if using ipw
.
Note: If weight
are provided, ipw
is not used. Leave NULL if setting ipw = TRUE
. Default = NULL
String: Algorithm to bag, for options, see modSelect
Integer: Number of base learners to train
Named list of arguments for mod
Logical: If TRUE, apply inverse probability weighting (for Classification only).
Note: If weights
are provided, ipw
is not used. Default = TRUE
Integer 0, 1, 2 1: class.weights as in 0, divided by max(class.weights) 2: class.weights as in 0, divided by min(class.weights) Default = 2
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Caution: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed)
List: Resample settings to use. There is no need to edit this, unless you want to change the type of
resampling. It will use stratified bootstrap by default. Use rtset.resample for convenience.
Default = rtset.resample(resampler = "strat.boot", n.resamples = k)
Function: used to average base learners' predictions. Default = mean. (Note: no quotes, as you are passing the function itself)
Character: Name for feature set
Character: Name for outcome
String: the question you are attempting to answer with this model, in plain language.
Logical: verbose
argument passed to learner
Logical: If TRUE, print summary to screen.
Integer: If > 0, print diagnostic info to console
Logical: if TRUE, produce plot using mplot3
Takes precedence over plot.fitted
and plot.predicted
Logical: if TRUE, plot True (y) vs Fitted
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires x.test
and y.test
String: "zero", "dark", "box", "darkbox"
Logical: Passed to print.plot
argument of base learner, i.e. if TRUE, print error plot
for each base learner
Integer: Number of cores to use
String: "fork" or "psock". Type of parallelization. Default = "fork" for macOS and Linux, "psock" for Windows
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if save.mod
is TRUE
Additional parameters to be passed to learner