Parse experimental design
extract_experimental_setup(
experimental_design,
file_dir,
message_indent = 0L,
verbose = TRUE
)
data.table with subsampler information at different levels of the experimental design.
(required) Defines what the experiment looks
like, e.g. cv(bt(fs,20)+mb,3,2)+ev
for 2 times repeated 3-fold
cross-validation with nested feature selection on 20 bootstraps and
model-building, and external validation. The basic workflow components are:
fs
: (required) feature selection step.
mb
: (required) model building step.
ev
: (optional) external validation. Note that internal validation due
to subsampling will always be conducted if the subsampling methods create
any validation data sets.
The different components are linked using +
.
Different subsampling methods can be used in conjunction with the basic workflow components:
bs(x,n)
: (stratified) .632 bootstrap, with n
the number of
bootstraps. In contrast to bt
, feature pre-processing parameters and
hyperparameter optimisation are conducted on individual bootstraps.
bt(x,n)
: (stratified) .632 bootstrap, with n
the number of
bootstraps. Unlike bs
and other subsampling methods, no separate
pre-processing parameters or optimised hyperparameters will be determined
for each bootstrap.
cv(x,n,p)
: (stratified) n
-fold cross-validation, repeated p
times.
Pre-processing parameters are determined for each iteration.
lv(x)
: leave-one-out-cross-validation. Pre-processing parameters are
determined for each iteration.
ip(x)
: imbalance partitioning for addressing class imbalances on the
data set. Pre-processing parameters are determined for each partition. The
number of partitions generated depends on the imbalance correction method
(see the imbalance_correction_method
parameter). Imbalance partitioning
does not generate validation sets.
As shown in the example above, sampling algorithms can be nested.
The simplest valid experimental design is fs+mb
, which corresponds to a
TRIPOD type 1a analysis. Type 1b analyses are only possible using
bootstraps, e.g. bt(fs+mb,100)
. Type 2a analyses can be conducted using
cross-validation, e.g. cv(bt(fs,100)+mb,10,1)
. Depending on the origin of
the external validation data, designs such as fs+mb+ev
or
cv(bt(fs,100)+mb,10,1)+ev
constitute type 2b or type 3 analyses. Type 4
analyses can be done by obtaining one or more familiarModel
objects from
others and applying them to your own data set.
Alternatively, the experimental_design
parameter may be used to provide a
path to a file containing iterations, which is named ####_iterations.RDS
by convention. This path can be relative to the directory of the current
experiment (experiment_dir
), or an absolute path. The absolute path may
thus also point to a file from a different experiment.
Spacing inserted before messages.
Sets verbosity.
This function converts the experimental_design string