resample: Resampling methods

Description

Create resamples of your data, e.g. for model building or validation. "bootstrap" gives the standard bootstrap, i.e. random sampling with replacement, using bootstrap, "strat.sub" creates stratified subsamples using strat.sub, while "strat.boot" uses strat.boot which runs strat.sub and then randomly duplicates some of the training cases to reach original length of input (default) or length defined by target.length.

Usage

resample(y, n.resamples = 10, resampler = c("strat.sub", "strat.boot",
  "kfold", "bootstrap", "loocv"), index = NULL, group = NULL,
  stratify.var = y, train.p = 0.75, strat.n.bins = 4,
  target.length = NROW(y), rtset = NULL, seed = NULL,
  verbose = FALSE)

Arguments

Numeric vector. Usually the outcome; length(y) defines sample size

n.resamples

Integer: Number of training/testing sets required

resampler

String: Type of resampling to perform: "bootstrap", "kfold", "strat.boot", "strat.sub". Default = "strat.boot" for length(y) < 200, otherwise "strat.sub"

index

List where each element is a vector of training set indices. Use this for manual or precalculated train/test splits

group

Integer, vector, length = length(y): Integer vector, where numbers define fold membership. e.g. for 10-fold on a dataset with 1000 cases, you could use group = rep(1:10, each = 100)

stratify.var

Numeric vector (optional): Variable used for stratification. Defaults to y

train.p

Float (0, 1): Fraction of cases to assign to traininig set for resampler = "strat.sub"

strat.n.bins

Integer: Number of groups to use for stratification for resampler = "strat.sub" / "strat.boot"

target.length

Integer: Number of cases for training set for resampler = "strat.boot". Default = length(y)

rtset

List: Output of an rtset.resample (or named list with same structure). NOTE: Overrides all other arguments. Default = NULL

seed

Integer: (Optional) Set seed for random number generator, in order to make output reproducible. See ?base::set.seed

verbose

Logical: If TRUE, print messages to screen

Details

resample is used by multiple rtemis learners, gridSearchLearn, and elevate. Note that option 'kfold', which uses kfold results in resamples of slightly different length for y of small length, so avoid all operations which rely on equal-length vectors. For example, you can't place resamples in a data.frame, but must use a list instead.

Description

Usage

Arguments

Details

See Also