This function powers grouped resampling by splitting the data based upon a grouping variable and returning the assessment set indices for each split.
make_groups(
data,
group,
v,
balance = c("groups", "observations", "prop"),
strata = NULL,
...
)
A data frame.
A variable in data
(single character or name) used for
grouping observations with the same value to either the analysis or
assessment set within a fold.
The number of partitions of the data set.
If v
is less than the number of unique groups, how should
groups be combined into folds? Should be one of
"groups"
, "observations"
, "prop"
.
A variable in data
(single character or name) used to conduct
stratified sampling. When not NULL
, each resample is created within the
stratification variable. Numeric strata
are binned into quartiles.
Arguments passed to balance functions.
Not all balance
options are accepted -- or make sense -- for all resampling
functions. For instance, balance = "prop"
assigns groups to folds at
random, meaning that any given observation is not guaranteed to be in one
(and only one) assessment set. That means balance = "prop"
can't
be used with group_vfold_cv()
, and so isn't an option available for that
function.
Similarly, group_mc_cv()
and its derivatives don't assign data to one (and
only one) assessment set, but rather allow each observation to be in an
assessment set zero-or-more times. As a result, those functions don't have
a balance
argument, and under the hood always specify balance = "prop"
when they call make_groups()
.