stratrs: Perform stratified random sampling to balance outcomes
Description
This function is used to perform stratified random
sampling to balance outcomes among the shards.
Usage
stratrs(y, C=5, P=0)
Value
A vector is returned with each element assigned to a shard.
Arguments
y
The binary/categorical/continuous outcome.
C
The number of shards to break the data set into.
P
For continuous data, we break the range into
P segments via the quantiles. Specifying, P=20 seems to
work reasonably well.
Details
To perform BART with large data sets, random sampling is employed
to break the data into C shards. Each shard should be
balanced with respect to the outcome. For binary/categorical
outcomes, stratified random sampling is employed with this function.