Learn R Programming

mlr3 (version 0.14.1)

partition: Manually Partition into Training and Test Set

Description

Creates a split of the row ids of a Task into a training set and a test set while optionally stratifying on the target column.

For more complex partitions, see the example.

Usage

partition(task, ratio = 0.67, stratify = TRUE, ...)

# S3 method for TaskRegr partition(task, ratio = 0.67, stratify = TRUE, bins = 3L, ...)

# S3 method for TaskClassif partition(task, ratio = 0.67, stratify = TRUE, ...)

Arguments

task

(Task)
Task to operate on.

ratio

(numeric(1))
Ratio of observations to put into the training set.

stratify

(logical(1))
If TRUE, stratify on the target variable. For regression tasks, the target variable is first cut into bins bins. See Task$add_strata().

...

(any)
Additional arguments, currently not used.

bins

(integer(1))
Number of bins to cut the target variable into for stratification.

Examples

Run this code
# regression task
task = tsk("boston_housing")

# roughly equal size split while stratifying on the binned response
split = partition(task, ratio = 0.5)
data = data.frame(
  y = c(task$truth(split$train), task$truth(split$test)),
  split = rep(c("train", "predict"), lengths(split))
)
boxplot(y ~ split, data = data)


# classification task
task = tsk("pima")
split = partition(task)

# roughly same distribution of the target label
prop.table(table(task$truth()))
prop.table(table(task$truth(split$train)))
prop.table(table(task$truth(split$test)))


# splitting into 3 disjunct sets, using ResamplingCV and stratification
task = tsk("iris")
task$set_col_roles(task$target_names, add_to = "stratum")
r = rsmp("cv", folds = 3)$instantiate(task)

sets = lapply(1:3, r$train_set)
lengths(sets)
prop.table(table(task$truth(sets[[1]])))

Run the code above in your browser using DataLab