elevate
is a high-level function to tune, train, and test an rtemis model
by nested resampling, with optional preprocessing and decomposition of input features
elevate(x, y = NULL, mod = "ranger", mod.params = list(),
.preprocess = NULL, .decompose = NULL, .resample = NULL,
weights = NULL, resampler = "strat.sub", n.resamples = 10,
n.repeats = 1, stratify.var = NULL, train.p = 0.8,
strat.n.bins = 4, target.length = NULL, seed = NULL,
res.index = NULL, res.group = NULL, bag.fn = median,
x.name = NULL, y.name = NULL, save.mods = TRUE, save.tune = TRUE,
cex = 1.4, col = "#18A3AC", bag.fitted = FALSE, n.cores = 1,
parallel.type = ifelse(.Platform$OS.type == "unix", "fork", "psock"),
print.plot = TRUE, plot.fitted = FALSE, plot.predicted = TRUE,
plot.theme = getOption("rt.fit.theme", "lightgrid"),
print.res.plot = FALSE, question = NULL, verbose = TRUE,
trace = 0, res.verbose = FALSE, headless = FALSE, outdir = NULL,
save.plots = FALSE, save.rt = ifelse(!is.null(outdir), TRUE, FALSE),
save.mod = TRUE, save.res = FALSE, ...)
Numeric vector or matrix / data frame of features i.e. independent variables
Numeric vector of outcome, i.e. dependent variable
String: Learner to use. Options: modSelect
Optional named list of parameters to be passed to mod
. All parameters can
be passed as part of ...
as well
Optional named list of parameters to be passed to preprocess. Set using
rtset.preprocess, e.g. decom = rtset.preprocess(impute = TRUE)
Optional named list of parameters to be used for decomposition / dimensionality
reduction. Set using rtset.decompose, e.g. decom = rtset.decompose("ica", 12)
Optional names list of parameters to be passed to resample. NOTE: If set, this takes precedence over setting the individual resampling arguments ()
Numeric vector: Weights for cases. For classification, weights
takes precedence
over ipw
, therefore set weights = NULL
if using ipw
.
Note: If weight
are provided, ipw
is not used. Leave NULL if setting ipw = TRUE
. Default = NULL
String: Type of resampling to perform: "bootstrap", "kfold", "strat.boot", "strat.sub".
Default = "strat.boot" for length(y) < 200
, otherwise "strat.sub"
Integer: Number of training/testing sets required
Integer: Number of times the external resample should be repeated. This allows you to do, for example, 10 times 10-fold crossvalidation. Default = 1. In most cases it makes sense to use 1 repeat of many resamples, e.g. 25 stratified subsamples,
Numeric vector: Used to stratify external sampling (if applicable)
Defaults to outcome y
Float (0, 1): Fraction of cases to assign to traininig set for resampler = "strat.sub"
Integer: Number of groups to use for stratification for
resampler = "strat.sub" / "strat.boot"
Integer: Number of cases for training set for resampler = "strat.boot"
.
Default = length(y)
Integer: (Optional) Set seed for random number generator, in order to make output reproducible.
See ?base::set.seed
List where each element is a vector of training set indices. Use this for manual or precalculated train/test splits
Integer, vector, length = length(y): Integer vector, where numbers define fold membership. e.g. for 10-fold on a dataset with 1000 cases, you could use group = rep(1:10, each = 100)
Function to use to average prediction if bag.fitted = TRUE
. Default = median
String: Name of predictor dataset
String: Name of outcome
Logical: If TRUE, retain trained models in object, otherwise discard (save space if running many resamples). Default = TRUE
Logical: If TRUE, save the best.tune data frame for each resample (output of gridSearchLearn)
Float: cex parameter for elevate plot
Color for elevate plot
Logical: If TRUE, use all models to also get a bagged prediction on the full sample. To get a bagged prediction on new data using the same models, use predict.rtModCV
Integer: Number of cores to use. Default = 1. You are likely parallelizing either in the inner (tuning) or the learner itself is parallelized. Don't parallelize the parallelization
String: "psock" (Default), "fork"
Logical: if TRUE, produce plot using mplot3
Takes precedence over plot.fitted
and plot.predicted
Logical: if TRUE, plot True (y) vs Fitted
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires x.test
and y.test
String: "zero", "dark", "box", "darkbox"
Logical: Print model performance plot for each resample. Defaults to FALSE from all resamples. Defaults to TRUE
String: the question you are attempting to answer with this model, in plain language.
Logical: If TRUE, print summary to screen.
Integer: (Not really used) Print additional information if > 0. Default = 0
Logical: Passed to resLearn, passed to each individual learner's verbose
argument
Logical: If TRUE, turn off all plotting.
String: Path where output should be saved
Logical: If TRUE, save plots to outdir
Logical: If TRUE and outdir
is set, save all models to outdir
Logical. If TRUE, save all output as RDS file in outdir
save.mod
is TRUE by default if an outdir
is defined. If set to TRUE, and no outdir
is defined, outdir defaults to paste0("./s.", mod.name)
Logical: If TRUE, save the full output of each model trained on differents resamples under
subdirectories of outdir
Additional mod.params to be passed to learner (Will be concatenated with mod.params
, so that you can use
either way to pass learner arguments)
Object of class rtModCV
(Regression) or rtModCVclass
(Classification)
the mean or aggregate error, as appropriate, for each repeat
the mean error of all repeats, i.e. the mean of error.test.repeats
if n.repeats
> 1, the standard deviation of error.test.repeats
the error for each resample, for each repeat
- Note on resampling: You can never use an outer resampling method with replacement if you will also be using an inner resampling (for tuning). The duplicated cases from the outer resampling may appear both in the training and testing sets of the inner resamples, leading to artificially decreased error.
- If there is an error while running either the outer or inner resamples in parallel, the error message returned by R will likely be unhelpful. Repeat the command after setting both inner and outer resample run to use a single core, which should provide an informative message.
# NOT RUN {
# Regression
x <- rnormmat(100, 50)
w <- rnorm(50)
y <- x %*% w + rnorm(50)
mod <- elevate(x, y)
# Classification
data(Sonar, package = "mlbench")
mod <- elevate(Sonar)
# }
Run the code above in your browser using DataLab