extract_experimental_setup: Parse experimental design

Description

Parse experimental design

Usage

extract_experimental_setup(
  experimental_design,
  file_dir,
  message_indent = 0L,
  verbose = TRUE
)

Value

data.table with subsampler information at different levels of the experimental design.

Arguments

experimental_design

(required) Defines what the experiment looks like, e.g. cv(bt(fs,20)+mb,3,2)+ev for 2 times repeated 3-fold cross-validation with nested feature selection on 20 bootstraps and model-building, and external validation. The basic workflow components are:

fs: (required) feature selection step.
mb: (required) model building step.
ev: (optional) external validation. Note that internal validation due to subsampling will always be conducted if the subsampling methods create any validation data sets.

The different components are linked using +.

Different subsampling methods can be used in conjunction with the basic workflow components:

bs(x,n): (stratified) .632 bootstrap, with n the number of bootstraps. In contrast to bt, feature pre-processing parameters and hyperparameter optimisation are conducted on individual bootstraps.
bt(x,n): (stratified) .632 bootstrap, with n the number of bootstraps. Unlike bs and other subsampling methods, no separate pre-processing parameters or optimised hyperparameters will be determined for each bootstrap.
cv(x,n,p): (stratified) n-fold cross-validation, repeated p times. Pre-processing parameters are determined for each iteration.
lv(x): leave-one-out-cross-validation. Pre-processing parameters are determined for each iteration.
ip(x): imbalance partitioning for addressing class imbalances on the data set. Pre-processing parameters are determined for each partition. The number of partitions generated depends on the imbalance correction method (see the imbalance_correction_method parameter). Imbalance partitioning does not generate validation sets.

As shown in the example above, sampling algorithms can be nested.

The simplest valid experimental design is fs+mb, which corresponds to a TRIPOD type 1a analysis. Type 1b analyses are only possible using bootstraps, e.g. bt(fs+mb,100). Type 2a analyses can be conducted using cross-validation, e.g. cv(bt(fs,100)+mb,10,1). Depending on the origin of the external validation data, designs such as fs+mb+ev or cv(bt(fs,100)+mb,10,1)+ev constitute type 2b or type 3 analyses. Type 4 analyses can be done by obtaining one or more familiarModel objects from others and applying them to your own data set.

Alternatively, the experimental_design parameter may be used to provide a path to a file containing iterations, which is named ####_iterations.RDS by convention. This path can be relative to the directory of the current experiment (experiment_dir), or an absolute path. The absolute path may thus also point to a file from a different experiment.

message_indent

Spacing inserted before messages.

verbose

Sets verbosity.

Details

This function converts the experimental_design string