Learn R Programming

⚠️There's a newer version (1.3.0) of this package.Take me there.

rsample

Overview

The rsample package provides functions to create different types of resamples and corresponding classes for their analysis. The goal is to have a modular set of methods that can be used for:

  • resampling for estimating the sampling distribution of a statistic
  • estimating model performance using a holdout set

The scope of rsample is to provide the basic building blocks for creating and analyzing resamples of a data set, but this package does not include code for modeling or calculating statistics. The Working with Resample Sets vignette gives a demonstration of how rsample tools can be used when building models.

Note that resampled data sets created by rsample are directly accessible in a resampling object but do not contain much overhead in memory. Since the original data is not modified, R does not make an automatic copy.

For example, creating 50 bootstraps of a data set does not create an object that is 50-fold larger in memory:

library(rsample)
library(mlbench)

data(LetterRecognition)
lobstr::obj_size(LetterRecognition)
#> 2,644,640 B

set.seed(35222)
boots <- bootstraps(LetterRecognition, times = 50)
lobstr::obj_size(boots)
#> 6,686,776 B

# Object size per resample
lobstr::obj_size(boots)/nrow(boots)
#> 133,735.5 B

# Fold increase is <<< 50
as.numeric(lobstr::obj_size(boots)/lobstr::obj_size(LetterRecognition))
#> [1] 2.528426

Created on 2022-02-28 by the reprex package (v2.0.1)

The memory usage for 50 bootstrap samples is less than 3-fold more than the original data set.

Installation

To install it, use:

install.packages("rsample")

And the development version from GitHub with:

# install.packages("devtools")
install_dev("rsample")

Contributing

This project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('rsample')

Monthly Downloads

55,286

Version

1.1.1

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Hannah Frick

Last Published

December 7th, 2022

Functions in rsample (1.1.1)

labels.rsplit

Find Labels from rsplit Object
permutations

Permutation sampling
reverse_splits

Reverse the analysis and assessment sets
reshuffle_rset

"Reshuffle" an rset to re-generate a new rset with the same parameters
nested_cv

Nested or Double Resampling
new_rset

Constructor for new rset objects
slide-resampling

Time-based Resampling
populate

Add Assessment Indices
rset_reconstruct

Extending rsample with new rset subclasses
manual_rset

Manual resampling
validation_split

Create a Validation Set
mc_cv

Monte Carlo Cross-Validation
tidy.rsplit

Tidy Resampling Object
vfold_cv

V-Fold Cross-Validation
reexports

Objects exported from other packages
reg_intervals

A convenience function for confidence intervals with linear-ish parametric models
rolling_origin

Rolling Origin Forecast Resampling
rsample-package

rsample: General Resampling Infrastructure
rsample-dplyr

Compatibility with dplyr
rsample2caret

Convert Resampling Objects to Other Formats
as.data.frame.rsplit

Convert an rsplit object to a data frame
complement

Determine the Assessment Samples
apparent

Sampling for the Apparent Error Rate
bootstraps

Bootstrap Sampling
form_pred

Extract Predictor Names from Formula or Terms
group_bootstraps

Group Bootstraps
get_rsplit

Retrieve individual rsplits objects from an rset
add_resample_id

Augment a data set with resampling identifiers
int_pctl

Bootstrap confidence intervals
group_vfold_cv

Group V-Fold Cross-Validation
make_strata

Create or Modify Stratification Variables
initial_split

Simple Training/Test Set Splitting
clustering_cv

Cluster Cross-Validation
labels.rset

Find Labels from rset Object
.get_fingerprint

Obtain a identifier for the resamples
loo_cv

Leave-One-Out Cross-Validation
make_splits

Constructors for split objects
make_groups

Make groupings for grouped rsplits
group_mc_cv

Group Monte Carlo Cross-Validation