.parse_hyperparameter_optimisation_settings: Internal function for parsing settings related to hyperparameter optimisation

Description

Internal function for parsing settings related to hyperparameter optimisation

Usage

.parse_hyperparameter_optimisation_settings(
  config = NULL,
  parallel,
  outcome_type,
  optimisation_bootstraps = waiver(),
  optimisation_determine_vimp = waiver(),
  smbo_random_initialisation = waiver(),
  smbo_n_random_sets = waiver(),
  max_smbo_iterations = waiver(),
  smbo_stop_convergent_iterations = waiver(),
  smbo_stop_tolerance = waiver(),
  smbo_time_limit = waiver(),
  smbo_initial_bootstraps = waiver(),
  smbo_step_bootstraps = waiver(),
  smbo_intensify_steps = waiver(),
  smbo_stochastic_reject_p_value = waiver(),
  optimisation_function = waiver(),
  optimisation_metric = waiver(),
  acquisition_function = waiver(),
  exploration_method = waiver(),
  hyperparameter_learner = waiver(),
  parallel_hyperparameter_optimisation = waiver(),
  ...
)

Value

List of parameters related to model hyperparameter optimisation.

Arguments

config

A list of settings, e.g. from an xml file.

parallel

Logical value that whether familiar uses parallelisation. If FALSE it will override parallel_hyperparameter_optimisation.

outcome_type

Type of outcome found in the data set.

optimisation_bootstraps

(optional) Number of bootstraps that should be generated from the development data set. During the optimisation procedure one or more of these bootstraps (indicated by smbo_step_bootstraps) are used for model development using different combinations of hyperparameters. The effect of the hyperparameters is then assessed by comparing in-bag and out-of-bag model performance.

The default number of bootstraps is 50. Hyperparameter optimisation may finish before exhausting the set of bootstraps.

optimisation_determine_vimp

(optional) Logical value that indicates whether variable importance is determined separately for each of the bootstraps created during the optimisation process (TRUE) or the applicable results from the feature selection step are used (FALSE).

Determining variable importance increases the initial computational overhead. However, it prevents positive biases for the out-of-bag data due to overlap of these data with the development data set used for the feature selection step. In this case, any hyperparameters of the variable importance method are not determined separately for each bootstrap, but those obtained during the feature selection step are used instead. In case multiple of such hyperparameter sets could be applicable, the set that will be used is randomly selected for each bootstrap.

This parameter only affects hyperparameter optimisation of learners. The default is TRUE.

smbo_random_initialisation

(optional) String indicating the initialisation method for the hyperparameter space. Can be one of fixed_subsample (default), fixed, or random. fixed and fixed_subsample first create hyperparameter sets from a range of default values set by familiar. fixed_subsample then randomly draws up to smbo_n_random_sets from the grid. random does not rely upon a fixed grid, and randomly draws up to smbo_n_random_sets hyperparameter sets from the hyperparameter space.

smbo_n_random_sets

(optional) Number of random or subsampled hyperparameters drawn during the initialisation process. Default: 100. Cannot be smaller than 10. The parameter is not used when smbo_random_initialisation is fixed, as the entire pre-defined grid will be explored.

max_smbo_iterations

(optional) Maximum number of intensify iterations of the SMBO algorithm. During an intensify iteration a run-off occurs between the current best hyperparameter combination and either 10 challenger combination with the highest expected improvement or a set of 20 random combinations.

Run-off with random combinations is used to force exploration of the hyperparameter space, and is performed every second intensify iteration, or if there is no expected improvement for any challenger combination.

If a combination of hyperparameters leads to better performance on the same data than the incumbent best set of hyperparameters, it replaces the incumbent set at the end of the intensify iteration.

The default number of intensify iteration is 20. Iterations may be stopped early if the incumbent set of hyperparameters remains the same for smbo_stop_convergent_iterations iterations, or performance improvement is minimal. This behaviour is suppressed during the first 4 iterations to enable the algorithm to explore the hyperparameter space.

smbo_stop_convergent_iterations

(optional) The number of subsequent convergent SMBO iterations required to stop hyperparameter optimisation early. An iteration is convergent if the best parameter set has not changed or the optimisation score over the 4 most recent iterations has not changed beyond the tolerance level in smbo_stop_tolerance.

The default value is 3.

smbo_stop_tolerance

(optional) Tolerance for early stopping due to convergent optimisation score.

The default value depends on the square root of the number of samples (at the series level), and is 0.01 for 100 samples. This value is computed as 0.1 * 1 / sqrt(n_samples). The upper limit is 0.0001 for 1M or more samples.

smbo_time_limit

(optional) Time limit (in minutes) for the optimisation process. Optimisation is stopped after this limit is exceeded. Time taken to determine variable importance for the optimisation process (see the optimisation_determine_vimp parameter) does not count.

The default is NULL, indicating that there is no time limit for the optimisation process. The time limit cannot be less than 1 minute.

smbo_initial_bootstraps

(optional) The number of bootstraps taken from the set of optimisation_bootstraps as the bootstraps assessed initially.

The default value is 1. The value cannot be larger than optimisation_bootstraps.

smbo_step_bootstraps

(optional) The number of bootstraps taken from the set of optimisation_bootstraps bootstraps as the bootstraps assessed during the steps of each intensify iteration.

The default value is 3. The value cannot be larger than optimisation_bootstraps.

smbo_intensify_steps

(optional) The number of steps in each SMBO intensify iteration. Each step a new set of smbo_step_bootstraps bootstraps is drawn and used in the run-off between the incumbent best hyperparameter combination and its challengers.

The default value is 5. Higher numbers allow for a more detailed comparison, but this comes with added computational cost.

smbo_stochastic_reject_p_value

(optional) The p-value threshold used for the stochastic_reject exploration method.

The default value is 0.05.

optimisation_function

(optional) Type of optimisation function used to quantify the performance of a hyperparameter set. Model performance is assessed using the metric(s) specified by optimisation_metric on the in-bag (IB) and out-of-bag (OOB) samples of a bootstrap. These values are converted to objective scores with a standardised interval of \([-1.0, 1.0]\). Each pair of objective is subsequently used to compute an optimisation score. The optimisation score across different bootstraps is than aggregated to a summary score. This summary score is used to rank hyperparameter sets, and select the optimal set.

The combination of optimisation score and summary score is determined by the optimisation function indicated by this parameter:

validation or max_validation (default): seeks to maximise OOB score.
balanced: seeks to balance IB and OOB score.
stronger_balance: similar to balanced, but with stronger penalty for differences between IB and OOB scores.
validation_minus_sd: seeks to optimise the average OOB score minus its standard deviation.
validation_25th_percentile: seeks to optimise the 25th percentile of OOB scores, and is conceptually similar to validation_minus_sd.
model_estimate: seeks to maximise the OOB score estimate predicted by the hyperparameter learner (not available for random search).
model_estimate_minus_sd: seeks to maximise the OOB score estimate minus its estimated standard deviation, as predicted by the hyperparameter learner (not available for random search).
model_balanced_estimate: seeks to maximise the estimate of the balanced IB and OOB score. This is similar to the balanced score, and in fact uses a hyperparameter learner to predict said score (not available for random search).
model_balanced_estimate_minus_sd: seeks to maximise the estimate of the balanced IB and OOB score, minus its estimated standard deviation. This is similar to the balanced score, but takes into account its estimated spread.

Additional detail are provided in the Learning algorithms and hyperparameter optimisation vignette.

optimisation_metric

(optional) One or more metrics used to compute performance scores. See the vignette on performance metrics for the available metrics.

If unset, the following metrics are used by default:

auc_roc: For binomial and multinomial models.
mse: Mean squared error for continuous models.
msle: Mean squared logarithmic error for count models.
concordance_index: For survival models.

Multiple optimisation metrics can be specified. Actual metric values are converted to an objective value by comparison with a baseline metric value that derives from a trivial model, i.e. majority class for binomial and multinomial outcomes, the median outcome for count and continuous outcomes and a fixed risk or time for survival outcomes.

acquisition_function

(optional) The acquisition function influences how new hyperparameter sets are selected. The algorithm uses the model learned by the learner indicated by hyperparameter_learner to search the hyperparameter space for hyperparameter sets that are either likely better than the best known set (exploitation) or where there is considerable uncertainty (exploration). The acquisition function quantifies this (Shahriari et al., 2016).

The following acquisition functions are available, and are described in more detail in the learner algorithms vignette:

improvement_probability: The probability of improvement quantifies the probability that the expected optimisation score for a set is better than the best observed optimisation score
improvement_empirical_probability: Similar to improvement_probability, but based directly on optimisation scores predicted by the individual decision trees.
expected_improvement (default): Computes expected improvement.
upper_confidence_bound: This acquisition function is based on the upper confidence bound of the distribution (Srinivas et al., 2012).
bayes_upper_confidence_bound: This acquisition function is based on the upper confidence bound of the distribution (Kaufmann et al., 2012).

exploration_method

(optional) Method used to steer exploration in post-initialisation intensive searching steps. As stated earlier, each SMBO iteration step compares suggested alternative parameter sets with an incumbent best set in a series of steps. The exploration method controls how the set of alternative parameter sets is pruned after each step in an iteration. Can be one of the following:

single_shot (default): The set of alternative parameter sets is not pruned, and each intensification iteration contains only a single intensification step that only uses a single bootstrap. This is the fastest exploration method, but only superficially tests each parameter set.
successive_halving: The set of alternative parameter sets is pruned by removing the worst performing half of the sets after each step (Jamieson and Talwalkar, 2016).
stochastic_reject: The set of alternative parameter sets is pruned by comparing the performance of each parameter set with that of the incumbent best parameter set using a paired Wilcoxon test based on shared bootstraps. Parameter sets that perform significantly worse, at an alpha level indicated by smbo_stochastic_reject_p_value, are pruned.
none: The set of alternative parameter sets is not pruned.

hyperparameter_learner

(optional) Any point in the hyperparameter space has a single, scalar, optimisation score value that is a priori unknown. During the optimisation process, the algorithm samples from the hyperparameter space by selecting hyperparameter sets and computing the optimisation score value for one or more bootstraps. For each hyperparameter set the resulting values are distributed around the actual value. The learner indicated by hyperparameter_learner is then used to infer optimisation score estimates for unsampled parts of the hyperparameter space.

The following models are available:

bayesian_additive_regression_trees or bart: Uses Bayesian Additive Regression Trees (Sparapani et al., 2021) for inference. Unlike standard random forests, BART allows for estimating posterior distributions directly and can extrapolate.
gaussian_process (default): Creates a localised approximate Gaussian process for inference (Gramacy, 2016). This allows for better scaling than deterministic Gaussian Processes.
random_forest: Creates a random forest for inference. Originally suggested by Hutter et al. (2011). A weakness of random forests is their lack of extrapolation beyond observed values, which limits their usefulness in exploiting promising areas of hyperparameter space.
random or random_search: Forgoes the use of models to steer optimisation. Instead, a random search is performed.

parallel_hyperparameter_optimisation

(optional) Enable parallel processing for hyperparameter optimisation. Defaults to TRUE. When set to FALSE, this will disable the use of parallel processing while performing optimisation, regardless of the settings of the parallel parameter. The parameter moreover specifies whether parallelisation takes place within the optimisation algorithm (inner, default), or in an outer loop ( outer) over learners, data subsamples, etc.

parallel_hyperparameter_optimisation is ignored if parallel=FALSE.

...

Unused arguments.

References

Hutter, F., Hoos, H. H. & Leyton-Brown, K. Sequential model-based optimization for general algorithm configuration. in Learning and Intelligent Optimization (ed. Coello, C. A. C.) 6683, 507–523 (Springer Berlin Heidelberg, 2011).
Shahriari, B., Swersky, K., Wang, Z., Adams, R. P. & de Freitas, N. Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proc. IEEE 104, 148–175 (2016)
Srinivas, N., Krause, A., Kakade, S. M. & Seeger, M. W. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting. IEEE Trans. Inf. Theory 58, 3250–3265 (2012)
Kaufmann, E., Cappé, O. & Garivier, A. On Bayesian upper confidence bounds for bandit problems. in Artificial intelligence and statistics 592–600 (2012).
Jamieson, K. & Talwalkar, A. Non-stochastic Best Arm Identification and Hyperparameter Optimization. in Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (eds. Gretton, A. & Robert, C. C.) vol. 51 240–248 (PMLR, 2016).
Gramacy, R. B. laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R. Journal of Statistical Software 72, 1–46 (2016)
Sparapani, R., Spanbauer, C. & McCulloch, R. Nonparametric Machine Learning and Efficient Computation with Bayesian Additive Regression Trees: The BART R Package. Journal of Statistical Software 97, 1–66 (2021)