50% off: Unlimited data and AI learning.
The Learning Leader's Guide to AI Literacy

⚠️There's a newer version (1.3.0) of this package.Take me there.

rsample

Overview

The rsample package provides functions to create different types of resamples and corresponding classes for their analysis. The goal is to have a modular set of methods that can be used for:

  • resampling for estimating the sampling distribution of a statistic
  • estimating model performance using a holdout set

The scope of rsample is to provide the basic building blocks for creating and analyzing resamples of a data set, but this package does not include code for modeling or calculating statistics. The Working with Resample Sets vignette gives a demonstration of how rsample tools can be used when building models.

Note that resampled data sets created by rsample are directly accessible in a resampling object but do not contain much overhead in memory. Since the original data is not modified, R does not make an automatic copy.

For example, creating 50 bootstraps of a data set does not create an object that is 50-fold larger in memory:

library(rsample)
library(mlbench)

data(LetterRecognition)
lobstr::obj_size(LetterRecognition)
#> 2,644,640 B

set.seed(35222)
boots <- bootstraps(LetterRecognition, times = 50)
lobstr::obj_size(boots)
#> 6,686,776 B

# Object size per resample
lobstr::obj_size(boots)/nrow(boots)
#> 133,735.5 B

# Fold increase is <<< 50
as.numeric(lobstr::obj_size(boots)/lobstr::obj_size(LetterRecognition))
#> [1] 2.528426

Created on 2022-02-28 by the reprex package (v2.0.1)

The memory usage for 50 bootstrap samples is less than 3-fold more than the original data set.

Installation

To install it, use:

install.packages("rsample")

And the development version from GitHub with:

# install.packages("pak")
pak::pak("rsample")

Contributing

This project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('rsample')

Monthly Downloads

55,286

Version

1.2.1

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Hannah Frick

Last Published

March 25th, 2024

Functions in rsample (1.2.1)

reverse_splits

Reverse the analysis and assessment sets
reg_intervals

A convenience function for confidence intervals with linear-ish parametric models
nested_cv

Nested or Double Resampling
make_strata

Create or Modify Stratification Variables
initial_split

Simple Training/Test Set Splitting
.get_fingerprint

Obtain a identifier for the resamples
manual_rset

Manual resampling
initial_validation_split

Create an Initial Train/Validation/Test Split
form_pred

Extract Predictor Names from Formula or Terms
slide-resampling

Time-based Resampling
rsample-dplyr

Compatibility with dplyr
reshuffle_rset

"Reshuffle" an rset to re-generate a new rset with the same parameters
rolling_origin

Rolling Origin Forecast Resampling
tidy.rsplit

Tidy Resampling Object
populate

Add Assessment Indices
rsample-package

rsample: General Resampling Infrastructure
validation_split

Create a Validation Set
validation_set

Create a Validation Split for Tuning
rset_reconstruct

Extending rsample with new rset subclasses
vfold_cv

V-Fold Cross-Validation
reexports

Objects exported from other packages
rsample2caret

Convert Resampling Objects to Other Formats
permutations

Permutation sampling
as.data.frame.rsplit

Convert an rsplit object to a data frame
complement

Determine the Assessment Samples
clustering_cv

Cluster Cross-Validation
add_resample_id

Augment a data set with resampling identifiers
bootstraps

Bootstrap Sampling
get_rsplit

Retrieve individual rsplits objects from an rset
apparent

Sampling for the Apparent Error Rate
group_bootstraps

Group Bootstraps
make_splits

Constructors for split objects
labels.rset

Find Labels from rset Object
group_mc_cv

Group Monte Carlo Cross-Validation
int_pctl

Bootstrap confidence intervals
loo_cv

Leave-One-Out Cross-Validation
labels.rsplit

Find Labels from rsplit Object
make_groups

Make groupings for grouped rsplits
new_rset

Constructor for new rset objects
group_vfold_cv

Group V-Fold Cross-Validation
mc_cv

Monte Carlo Cross-Validation