Create resamples of your data, e.g. for model building or validation.
"bootstrap" gives the standard bootstrap, i.e. random sampling with replacement, using bootstrap,
"strat.sub" creates stratified subsamples using strat.sub, while "strat.boot"
uses strat.boot which runs strat.sub and then randomly duplicates some of the training cases to reach original
length of input (default) or length defined by target.length
.
resample(y, n.resamples = 10, resampler = c("strat.sub", "strat.boot",
"kfold", "bootstrap", "loocv"), index = NULL, group = NULL,
stratify.var = y, train.p = 0.75, strat.n.bins = 4,
target.length = NROW(y), rtset = NULL, seed = NULL,
verbose = FALSE)
Numeric vector. Usually the outcome; length(y)
defines sample size
Integer: Number of training/testing sets required
String: Type of resampling to perform: "bootstrap", "kfold", "strat.boot", "strat.sub".
Default = "strat.boot" for length(y) < 200
, otherwise "strat.sub"
List where each element is a vector of training set indices. Use this for manual or precalculated train/test splits
Integer, vector, length = length(y)
: Integer vector, where numbers define fold membership.
e.g. for 10-fold on a dataset with 1000 cases, you could use group = rep(1:10, each = 100)
Numeric vector (optional): Variable used for stratification. Defaults to y
Float (0, 1): Fraction of cases to assign to traininig set for resampler = "strat.sub"
Integer: Number of groups to use for stratification for
resampler = "strat.sub" / "strat.boot"
Integer: Number of cases for training set for resampler = "strat.boot"
.
Default = length(y)
List: Output of an rtset.resample (or named list with same structure). NOTE: Overrides all other arguments. Default = NULL
Integer: (Optional) Set seed for random number generator, in order to make output reproducible.
See ?base::set.seed
Logical: If TRUE, print messages to screen
resample
is used by multiple rtemis learners, gridSearchLearn, and
elevate. Note that option 'kfold', which uses kfold results in resamples of slightly
different length for y of small length, so avoid all operations which rely on equal-length vectors.
For example, you can't place resamples in a data.frame, but must use a list instead.