Internal function for parsing settings related to hyperparameter optimisation
.parse_hyperparameter_optimisation_settings(
config = NULL,
parallel,
outcome_type,
optimisation_bootstraps = waiver(),
optimisation_determine_vimp = waiver(),
smbo_random_initialisation = waiver(),
smbo_n_random_sets = waiver(),
max_smbo_iterations = waiver(),
smbo_stop_convergent_iterations = waiver(),
smbo_stop_tolerance = waiver(),
smbo_time_limit = waiver(),
smbo_initial_bootstraps = waiver(),
smbo_step_bootstraps = waiver(),
smbo_intensify_steps = waiver(),
smbo_stochastic_reject_p_value = waiver(),
optimisation_function = waiver(),
optimisation_metric = waiver(),
acquisition_function = waiver(),
exploration_method = waiver(),
hyperparameter_learner = waiver(),
parallel_hyperparameter_optimisation = waiver(),
...
)
List of parameters related to model hyperparameter optimisation.
A list of settings, e.g. from an xml file.
Logical value that whether familiar uses parallelisation. If
FALSE
it will override parallel_hyperparameter_optimisation
.
Type of outcome found in the data set.
(optional) Number of bootstraps that should
be generated from the development data set. During the optimisation
procedure one or more of these bootstraps (indicated by
smbo_step_bootstraps
) are used for model development using different
combinations of hyperparameters. The effect of the hyperparameters is then
assessed by comparing in-bag and out-of-bag model performance.
The default number of bootstraps is 50
. Hyperparameter optimisation may
finish before exhausting the set of bootstraps.
(optional) Logical value that indicates
whether variable importance is determined separately for each of the
bootstraps created during the optimisation process (TRUE
) or the
applicable results from the feature selection step are used (FALSE
).
Determining variable importance increases the initial computational overhead. However, it prevents positive biases for the out-of-bag data due to overlap of these data with the development data set used for the feature selection step. In this case, any hyperparameters of the variable importance method are not determined separately for each bootstrap, but those obtained during the feature selection step are used instead. In case multiple of such hyperparameter sets could be applicable, the set that will be used is randomly selected for each bootstrap.
This parameter only affects hyperparameter optimisation of learners. The
default is TRUE
.
(optional) String indicating the
initialisation method for the hyperparameter space. Can be one of
fixed_subsample
(default), fixed
, or random
. fixed
and
fixed_subsample
first create hyperparameter sets from a range of default
values set by familiar. fixed_subsample
then randomly draws up to
smbo_n_random_sets
from the grid. random
does not rely upon a fixed
grid, and randomly draws up to smbo_n_random_sets
hyperparameter sets
from the hyperparameter space.
(optional) Number of random or subsampled
hyperparameters drawn during the initialisation process. Default: 100
.
Cannot be smaller than 10
. The parameter is not used when
smbo_random_initialisation
is fixed
, as the entire pre-defined grid
will be explored.
(optional) Maximum number of intensify iterations of the SMBO algorithm. During an intensify iteration a run-off occurs between the current best hyperparameter combination and either 10 challenger combination with the highest expected improvement or a set of 20 random combinations.
Run-off with random combinations is used to force exploration of the hyperparameter space, and is performed every second intensify iteration, or if there is no expected improvement for any challenger combination.
If a combination of hyperparameters leads to better performance on the same data than the incumbent best set of hyperparameters, it replaces the incumbent set at the end of the intensify iteration.
The default number of intensify iteration is 20
. Iterations may be
stopped early if the incumbent set of hyperparameters remains the same for
smbo_stop_convergent_iterations
iterations, or performance improvement is
minimal. This behaviour is suppressed during the first 4 iterations to
enable the algorithm to explore the hyperparameter space.
(optional) The number of subsequent
convergent SMBO iterations required to stop hyperparameter optimisation
early. An iteration is convergent if the best parameter set has not
changed or the optimisation score over the 4 most recent iterations has not
changed beyond the tolerance level in smbo_stop_tolerance
.
The default value is 3
.
(optional) Tolerance for early stopping due to convergent optimisation score.
The default value depends on the square root of the number of samples (at
the series level), and is 0.01
for 100 samples. This value is computed as
0.1 * 1 / sqrt(n_samples)
. The upper limit is 0.0001
for 1M or more
samples.
(optional) Time limit (in minutes) for the
optimisation process. Optimisation is stopped after this limit is exceeded.
Time taken to determine variable importance for the optimisation process
(see the optimisation_determine_vimp
parameter) does not count.
The default is NULL
, indicating that there is no time limit for the
optimisation process. The time limit cannot be less than 1 minute.
(optional) The number of bootstraps taken
from the set of optimisation_bootstraps
as the bootstraps assessed
initially.
The default value is 1
. The value cannot be larger than
optimisation_bootstraps
.
(optional) The number of bootstraps taken from
the set of optimisation_bootstraps
bootstraps as the bootstraps assessed
during the steps of each intensify iteration.
The default value is 3
. The value cannot be larger than
optimisation_bootstraps
.
(optional) The number of steps in each SMBO
intensify iteration. Each step a new set of smbo_step_bootstraps
bootstraps is drawn and used in the run-off between the incumbent best
hyperparameter combination and its challengers.
The default value is 5
. Higher numbers allow for a more detailed
comparison, but this comes with added computational cost.
(optional) The p-value threshold used
for the stochastic_reject
exploration method.
The default value is 0.05
.
(optional) Type of optimisation function used
to quantify the performance of a hyperparameter set. Model performance is
assessed using the metric(s) specified by optimisation_metric
on the
in-bag (IB) and out-of-bag (OOB) samples of a bootstrap. These values are
converted to objective scores with a standardised interval of
\([-1.0, 1.0]\). Each pair of objective is subsequently used to compute an
optimisation score. The optimisation score across different bootstraps is
than aggregated to a summary score. This summary score is used to rank
hyperparameter sets, and select the optimal set.
The combination of optimisation score and summary score is determined by the optimisation function indicated by this parameter:
validation
or max_validation
(default): seeks to maximise OOB score.
balanced
: seeks to balance IB and OOB score.
stronger_balance
: similar to balanced
, but with stronger penalty for
differences between IB and OOB scores.
validation_minus_sd
: seeks to optimise the average OOB score minus its
standard deviation.
validation_25th_percentile
: seeks to optimise the 25th percentile of
OOB scores, and is conceptually similar to validation_minus_sd
.
model_estimate
: seeks to maximise the OOB score estimate predicted by
the hyperparameter learner (not available for random search).
model_estimate_minus_sd
: seeks to maximise the OOB score estimate minus
its estimated standard deviation, as predicted by the hyperparameter
learner (not available for random search).
model_balanced_estimate
: seeks to maximise the estimate of the balanced
IB and OOB score. This is similar to the balanced
score, and in fact uses
a hyperparameter learner to predict said score (not available for random
search).
model_balanced_estimate_minus_sd
: seeks to maximise the estimate of the
balanced IB and OOB score, minus its estimated standard deviation. This is
similar to the balanced
score, but takes into account its estimated
spread.
Additional detail are provided in the Learning algorithms and hyperparameter optimisation vignette.
(optional) One or more metrics used to compute performance scores. See the vignette on performance metrics for the available metrics.
If unset, the following metrics are used by default:
auc_roc
: For binomial
and multinomial
models.
mse
: Mean squared error for continuous
models.
msle
: Mean squared logarithmic error for count
models.
concordance_index
: For survival
models.
Multiple optimisation metrics can be specified. Actual metric values are converted to an objective value by comparison with a baseline metric value that derives from a trivial model, i.e. majority class for binomial and multinomial outcomes, the median outcome for count and continuous outcomes and a fixed risk or time for survival outcomes.
(optional) The acquisition function influences
how new hyperparameter sets are selected. The algorithm uses the model
learned by the learner indicated by hyperparameter_learner
to search the
hyperparameter space for hyperparameter sets that are either likely better
than the best known set (exploitation) or where there is considerable
uncertainty (exploration). The acquisition function quantifies this
(Shahriari et al., 2016).
The following acquisition functions are available, and are described in more detail in the learner algorithms vignette:
improvement_probability
: The probability of improvement quantifies the
probability that the expected optimisation score for a set is better than
the best observed optimisation score
improvement_empirical_probability
: Similar to
improvement_probability
, but based directly on optimisation scores
predicted by the individual decision trees.
expected_improvement
(default): Computes expected improvement.
upper_confidence_bound
: This acquisition function is based on the upper
confidence bound of the distribution (Srinivas et al., 2012).
bayes_upper_confidence_bound
: This acquisition function is based on the
upper confidence bound of the distribution (Kaufmann et al., 2012).
(optional) Method used to steer exploration in post-initialisation intensive searching steps. As stated earlier, each SMBO iteration step compares suggested alternative parameter sets with an incumbent best set in a series of steps. The exploration method controls how the set of alternative parameter sets is pruned after each step in an iteration. Can be one of the following:
single_shot
(default): The set of alternative parameter sets is not
pruned, and each intensification iteration contains only a single
intensification step that only uses a single bootstrap. This is the fastest
exploration method, but only superficially tests each parameter set.
successive_halving
: The set of alternative parameter sets is
pruned by removing the worst performing half of the sets after each step
(Jamieson and Talwalkar, 2016).
stochastic_reject
: The set of alternative parameter sets is pruned by
comparing the performance of each parameter set with that of the incumbent
best parameter set using a paired Wilcoxon test based on shared
bootstraps. Parameter sets that perform significantly worse, at an alpha
level indicated by smbo_stochastic_reject_p_value
, are pruned.
none
: The set of alternative parameter sets is not pruned.
(optional) Any point in the hyperparameter
space has a single, scalar, optimisation score value that is a priori
unknown. During the optimisation process, the algorithm samples from the
hyperparameter space by selecting hyperparameter sets and computing the
optimisation score value for one or more bootstraps. For each
hyperparameter set the resulting values are distributed around the actual
value. The learner indicated by hyperparameter_learner
is then used to
infer optimisation score estimates for unsampled parts of the
hyperparameter space.
The following models are available:
bayesian_additive_regression_trees
or bart
: Uses Bayesian Additive
Regression Trees (Sparapani et al., 2021) for inference. Unlike standard
random forests, BART allows for estimating posterior distributions directly
and can extrapolate.
gaussian_process
(default): Creates a localised approximate Gaussian
process for inference (Gramacy, 2016). This allows for better scaling than
deterministic Gaussian Processes.
random_forest
: Creates a random forest for inference. Originally
suggested by Hutter et al. (2011). A weakness of random forests is their
lack of extrapolation beyond observed values, which limits their usefulness
in exploiting promising areas of hyperparameter space.
random
or random_search
: Forgoes the use of models to steer
optimisation. Instead, a random search is performed.
(optional) Enable parallel
processing for hyperparameter optimisation. Defaults to TRUE
. When set to
FALSE
, this will disable the use of parallel processing while performing
optimisation, regardless of the settings of the parallel
parameter. The
parameter moreover specifies whether parallelisation takes place within the
optimisation algorithm (inner
, default), or in an outer loop ( outer
)
over learners, data subsamples, etc.
parallel_hyperparameter_optimisation
is ignored if parallel=FALSE
.
Unused arguments.
Hutter, F., Hoos, H. H. & Leyton-Brown, K. Sequential model-based optimization for general algorithm configuration. in Learning and Intelligent Optimization (ed. Coello, C. A. C.) 6683, 507–523 (Springer Berlin Heidelberg, 2011).
Shahriari, B., Swersky, K., Wang, Z., Adams, R. P. & de Freitas, N. Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proc. IEEE 104, 148–175 (2016)
Srinivas, N., Krause, A., Kakade, S. M. & Seeger, M. W. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting. IEEE Trans. Inf. Theory 58, 3250–3265 (2012)
Kaufmann, E., Cappé, O. & Garivier, A. On Bayesian upper confidence bounds for bandit problems. in Artificial intelligence and statistics 592–600 (2012).
Jamieson, K. & Talwalkar, A. Non-stochastic Best Arm Identification and Hyperparameter Optimization. in Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (eds. Gretton, A. & Robert, C. C.) vol. 51 240–248 (PMLR, 2016).
Gramacy, R. B. laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R. Journal of Statistical Software 72, 1–46 (2016)
Sparapani, R., Spanbauer, C. & McCulloch, R. Nonparametric Machine Learning and Efficient Computation with Bayesian Additive Regression Trees: The BART R Package. Journal of Statistical Software 97, 1–66 (2021)