Learn R Programming

rsample

Overview

The rsample package provides functions to create different types of resamples and corresponding classes for their analysis. The goal is to have a modular set of methods that can be used for:

  • resampling for estimating the sampling distribution of a statistic
  • estimating model performance using a holdout set

The scope of rsample is to provide the basic building blocks for creating and analyzing resamples of a data set, but this package does not include code for modeling or calculating statistics. The Working with Resample Sets vignette gives a demonstration of how rsample tools can be used when building models.

Note that resampled data sets created by rsample are directly accessible in a resampling object but do not contain much overhead in memory. Since the original data is not modified, R does not make an automatic copy.

For example, creating 50 bootstraps of a data set does not create an object that is 50-fold larger in memory:

library(rsample)
library(mlbench)

data(LetterRecognition)
lobstr::obj_size(LetterRecognition)
#> 2,644,640 B

set.seed(35222)
boots <- bootstraps(LetterRecognition, times = 50)
lobstr::obj_size(boots)
#> 6,686,776 B

# Object size per resample
lobstr::obj_size(boots)/nrow(boots)
#> 133,735.5 B

# Fold increase is <<< 50
as.numeric(lobstr::obj_size(boots)/lobstr::obj_size(LetterRecognition))
#> [1] 2.528426

Created on 2022-02-28 by the reprex package (v2.0.1)

The memory usage for 50 bootstrap samples is less than 3-fold more than the original data set.

Installation

To install it, use:

install.packages("rsample")

And the development version from GitHub with:

# install.packages("pak")
pak::pak("rsample")

Contributing

This project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('rsample')

Monthly Downloads

60,681

Version

1.2.1

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

March 25th, 2024

Functions in rsample (1.2.1)

reverse_splits

Reverse the analysis and assessment sets
reg_intervals

A convenience function for confidence intervals with linear-ish parametric models
nested_cv

Nested or Double Resampling
make_strata

Create or Modify Stratification Variables
initial_split

Simple Training/Test Set Splitting
.get_fingerprint

Obtain a identifier for the resamples
manual_rset

Manual resampling
initial_validation_split

Create an Initial Train/Validation/Test Split
form_pred

Extract Predictor Names from Formula or Terms
slide-resampling

Time-based Resampling
rsample-dplyr

Compatibility with dplyr
reshuffle_rset

"Reshuffle" an rset to re-generate a new rset with the same parameters
rolling_origin

Rolling Origin Forecast Resampling
tidy.rsplit

Tidy Resampling Object
populate

Add Assessment Indices
rsample-package

rsample: General Resampling Infrastructure
validation_split

Create a Validation Set
validation_set

Create a Validation Split for Tuning
rset_reconstruct

Extending rsample with new rset subclasses
vfold_cv

V-Fold Cross-Validation
reexports

Objects exported from other packages
rsample2caret

Convert Resampling Objects to Other Formats
permutations

Permutation sampling
as.data.frame.rsplit

Convert an rsplit object to a data frame
complement

Determine the Assessment Samples
clustering_cv

Cluster Cross-Validation
add_resample_id

Augment a data set with resampling identifiers
bootstraps

Bootstrap Sampling
get_rsplit

Retrieve individual rsplits objects from an rset
apparent

Sampling for the Apparent Error Rate
group_bootstraps

Group Bootstraps
make_splits

Constructors for split objects
labels.rset

Find Labels from rset Object
group_mc_cv

Group Monte Carlo Cross-Validation
int_pctl

Bootstrap confidence intervals
loo_cv

Leave-One-Out Cross-Validation
labels.rsplit

Find Labels from rsplit Object
make_groups

Make groupings for grouped rsplits
new_rset

Constructor for new rset objects
group_vfold_cv

Group V-Fold Cross-Validation
mc_cv

Monte Carlo Cross-Validation