create_holdout_partition: Create a holdout partition based on the specified algorithm

Description

This method creates multi-label dataset for train, test, validation or other proposes the partition method defined in method. The number of partitions is defined in partitions parameter. Each instance is used in only one partition of division.

Usage

create_holdout_partition(mdata, partitions = c(train = 0.7, test = 0.3),
  method = c("random", "iterative", "stratified"))

Arguments

mdata

A mldr dataset.

partitions

A list of percentages or a single value. The sum of all values does not be greater than 1. If a single value is informed then the complement of them is applied to generated the second partition. If two or more values are informed and the sum of them is lower than 1 the partitions will be generated with the informed proportion. If partitions have names, they are used to name the return. (Default: c(train=0.7, test=0.3)).

method

The method to split the data. The default methods are:

random: Split randomly the folds.
iterative: Split the folds considering the labels proportions individually. Some specific label can not occurs in all folds.
stratified: Split the folds considering the labelset proportions.

You can also create your own partition method. See the note and example sections to more details. (Default: "random")

Value

A list with at least two datasets sampled as specified in partitions parameter.

References

Sechidis, K., Tsoumakas, G., & Vlahavas, I. (2011). On the stratification of multi-label data. In Proceedings of the Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD (pp. 145-158).

Examples

Run this code

# NOT RUN {
dataset <- create_holdout_partition(toyml)
names(dataset)
## [1] "train" "test"
#dataset$train
#dataset$test

dataset <- create_holdout_partition(toyml, c(a=0.1, b=0.2, c=0.3, d=0.4))
#' names(dataset)
#' ## [1] "a" "b" "c" "d"

sequencial_split <- function (mdata, r) {
 S <- list()

 amount <- trunc(r * mdata$measures$num.instances)
 indexes <- c(0, cumsum(amount))
 indexes[length(r)+1] <- mdata$measures$num.instances

 S <- lapply(seq(length(r)), function (i) {
   seq(indexes[i]+1, indexes[i+1])
 })

 S
}
dataset <- create_holdout_partition(toyml, method="sequencial_split")
# }

Run the code above in your browser using DataLab