50% off: Unlimited data and AI learning.
The Learning Leader's Guide to AI Literacy

rsample

Overview

The rsample package provides functions to create different types of resamples and corresponding classes for their analysis. The goal is to have a modular set of methods that can be used for:

  • resampling for estimating the sampling distribution of a statistic
  • estimating model performance using a holdout set

The scope of rsample is to provide the basic building blocks for creating and analyzing resamples of a data set, but this package does not include code for modeling or calculating statistics. The Working with Resample Sets vignette gives a demonstration of how rsample tools can be used when building models.

Note that resampled data sets created by rsample are directly accessible in a resampling object but do not contain much overhead in memory. Since the original data is not modified, R does not make an automatic copy.

For example, creating 50 bootstraps of a data set does not create an object that is 50-fold larger in memory:

library(rsample)
library(mlbench)

data(LetterRecognition)
lobstr::obj_size(LetterRecognition)
#> 2,644,640 B

set.seed(35222)
boots <- bootstraps(LetterRecognition, times = 50)
lobstr::obj_size(boots)
#> 6,686,776 B

# Object size per resample
lobstr::obj_size(boots)/nrow(boots)
#> 133,735.5 B

# Fold increase is <<< 50
as.numeric(lobstr::obj_size(boots)/lobstr::obj_size(LetterRecognition))
#> [1] 2.528426

Created on 2022-02-28 by the reprex package (v2.0.1)

The memory usage for 50 bootstrap samples is less than 3-fold more than the original data set.

Installation

To install it, use:

install.packages("rsample")

And the development version from GitHub with:

# install.packages("pak")
pak::pak("rsample")

Contributing

This project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('rsample')

Monthly Downloads

55,286

Version

1.3.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Hannah Frick

Last Published

April 2nd, 2025

Functions in rsample (1.3.0)

new_rset

Constructor for new rset objects
inner_split

Inner split of the analysis set for fitting a post-processor
initial_validation_split

Create an Initial Train/Validation/Test Split
labels.rset

Find Labels from rset Object
int_pctl

Bootstrap confidence intervals
manual_rset

Manual resampling
loo_cv

Leave-One-Out Cross-Validation
make_strata

Create or Modify Stratification Variables
labels.rsplit

Find Labels from rsplit Object
reverse_splits

Reverse the analysis and assessment sets
nested_cv

Nested or Double Resampling
mc_cv

Monte Carlo Cross-Validation
rolling_origin

Rolling Origin Forecast Resampling
group_vfold_cv

Group V-Fold Cross-Validation
populate

Add Assessment Indices
initial_split

Simple Training/Test Set Splitting
permutations

Permutation sampling
reexports

Objects exported from other packages
slide-resampling

Time-based Resampling
group_bootstraps

Group Bootstraps
group_mc_cv

Group Monte Carlo Cross-Validation
tidy.rsplit

Tidy Resampling Object
rsample-dplyr

Compatibility with dplyr
rsample-package

rsample: General Resampling Infrastructure
vfold_cv

V-Fold Cross-Validation
reshuffle_rset

"Reshuffle" an rset to re-generate a new rset with the same parameters
reg_intervals

A convenience function for confidence intervals with linear-ish parametric models
rsample2caret

Convert Resampling Objects to Other Formats
rset_reconstruct

Extending rsample with new rset subclasses
make_splits

Constructors for split objects
validation_set

Create a Validation Split for Tuning
make_groups

Make groupings for grouped rsplits
validation_split

Create a Validation Set
apparent

Sampling for the Apparent Error Rate
.get_fingerprint

Obtain a identifier for the resamples
as.data.frame.rsplit

Convert an rsplit object to a data frame
complement

Determine the Assessment Samples
form_pred

Extract Predictor Names from Formula or Terms
bootstraps

Bootstrap Sampling
add_resample_id

Augment a data set with resampling identifiers
clustering_cv

Cluster Cross-Validation
get_rsplit

Retrieve individual rsplits objects from an rset
.get_split_args

Get the split arguments from an rset