Learn R Programming

Laurae (version 0.0.0.9001)

nkfold: (Un)Stratified Repeated k-fold for any type of label

Description

This function allows to create (un)stratified repeated folds from a label vector.

Usage

nkfold(y, n = 2, k = 5, stratified = TRUE, seed = 0, named = TRUE,
  weight = FALSE)

Arguments

y
Type: The label vector.
n
Type: integer. The amount of repeated fold computations to perform. Defaults to 2.
k
Type: integer or vector of integers. The amount of folds to create. Causes issues if length(y) < k (e.g more folds than samples). If a vector of integers is supplied, then for each k-fold in the repeat N, k[N] is selected as the number of folds. Defaults to 5.
stratified
Type: boolean. Whether the folds should be stratified (keep the same label proportions) or not. Defaults to TRUE.
seed
Type: integer or vector of integers. The seed for the random number generator. If a vector of integer is provided, its length should be at least longer than n. Otherwise (if an integer is supplied), it starts each fold with the provided seed, and adds 1 to the seed for every repeat. Defaults to 0.
named
Type: boolean. Whether the folds should be named. Defaults to TRUE.
weight
Type: boolean. Whether to return the weights of each fold so their sum is equal to 1. Defaults to TRUE.

Value

A list of vectors for each fold, where an integer represents the row number, or a list of list containing Folds and Weights if weight = TRUE.

Examples

Run this code
# Reproducible Stratified Repeated folds
data <- 1:5000
folds1 <- nkfold(y = data, n = 2, k = 5, stratified = TRUE, seed = 111)
folds2 <- nkfold(y = data, n = 2, k = 5, stratified = TRUE, seed = c(111, 112))
identical(folds1, folds2)

# Stratified Repeated Regression
data <- 1:5000
folds <- nkfold(y = data, n = 2, k = 5, stratified = TRUE)
for (i in 1:length(folds)) {
  print(mean(data[folds[[i]]]))
}

# Stratified Repeated Multi-class Classification
data <- c(rep(0, 250), rep(1, 250), rep(2, 250))
folds <- nkfold(y = data, n = 2, k = 5, stratified = TRUE)
for (i in 1:length(folds)) {
  print(mean(data[folds[[i]]]))
}

# Unstratified Repeated Regression
data <- 1:5000
folds <- nkfold(y = data, n = 2, k = 5, stratified = FALSE)
for (i in 1:length(folds)) {
  print(mean(data[folds[[i]]]))
}

# Unstratified Repeated Multi-class Classification
data <- c(rep(0, 250), rep(1, 250), rep(2, 250))
folds <- nkfold(y = data, n = 2, k = 5, stratified = FALSE)
for (i in 1:length(folds)) {
  print(mean(data[folds[[i]]]))
}

# Stratified Repeated 3-5-10 fold Cross-Validation all in one
data <- c(rep(0, 250), rep(1, 250), rep(2, 250))
str(nkfold(data, n = 3, k = c(3, 5, 10)))

Run the code above in your browser using DataLab