ExtraOpt: Cross-Entropy -based Hybrid Optimization

Description

This function allows to optimize for any input value: continuous, ordinal, discrete/categorical. Simplex-constrained-type optimization is not yet implemented (mutlivariate constraints which are not univariate constraints are not yet implemented). It tries to keep the discrete distribution, and as such, can be used to reduce dimensionality of supervised machine learning model (feature selection) while optimizing the performance. To get an overview of how to structure your functions to use (you need 3!!), check .ExtraOpt_trainer, .ExtraOpt_estimate, and .ExtraOpt_prob. For plotting, check .ExtraOpt_plot for an example.

Usage

ExtraOpt(f_train = .ExtraOpt_trainer, ..., f_est = .ExtraOpt_estimate,
  f_prob = .ExtraOpt_prob, preInit = NULL, Ninit = 50L, Nmax = 200,
  Nimprove = 10, elites = 0.9, max_elites = 150, tested_elites = 5,
  elites_converge = 10, CEmax = 200, CEiter = 20, CEelite = 0.1,
  CEimprove = 3, CEexploration_cont = 2, CEexploration_disc = c(2, 5),
  CEexploration_decay = 0.98, maximize = TRUE, best = NULL,
  cMean = NULL, cSD = NULL, cOrdinal = NULL, cMin = NULL, cMax = NULL,
  cThr = 0.001, dProb = NULL, dThr = 0.999, priorsC = NULL,
  priorsD = NULL, errorCode = -9999, autoExpVar = FALSE,
  autoExpFile = NULL, verbose = 1, plot = NULL, debug = FALSE)

Arguments

f_train

Type: function. The training function which returns at the end the loss. All arguments provided to ExtraOpt in ... are provided to f_train. Defaults to .ExtraOpt_trainer, which is a sample xgboost trainer.

...

Type: any. Arguments to pass to f_train.

f_est

Type: function. The estimator supervised machine learning function for the variables to optimize. It must return a list with Model as the model to use for f_prob, and the Error as the loss of the estimator model. Defaults to .ExtraOpt_estimate, which is a sample xgboost variable estimator.

f_prob

Type: function. The predictor function for the supervised machine learning function. It takes the model from f_est and a prior vector as inputs, and returns the predicted loss from f_est. Defaults to .ExtraOpt_prob, which is a sample xgboost estimator prediction.

preInit

Type: boolean. Whether a prior list is already computed to be used instead of the initiailzation. Set Ninit accordingly if you use a pre-initialized priors matrix. Defaults to NULL.

Ninit

Type: integer. The initialization amount. It is best to use at least 2 times the number of initialization vs the number of variables. For instance, 50 features should require Ninit = 100, even if it does not guarantee a best result.

Nmax

Type: integer. The maximum number of iterations alloted to optimize the variables provided against the loss. Once this amount of iterations is reached (excluding error code iterations), the function stops. Defaults to 200.

Nimprove

Type: integer. The maximum number of iterations alloted to optimize without improvements. Defaults to 10.

elites

Type: numeric. The percentage of iteration samples retained in the parameter estimator. The larger the elites, the lower the ability to get stuck at a local optima. However, a very low elite amount would get quickly stuck at a local optima and potentially overfit. After the initialization, a minimum of 5 sampled elites is mandatory. For instance, if Ninit = 100, then elites >= 0.05. It should do be higher than 1. If the sampling results in a decimal-valued numeric, it will take the largest value. If the sampling results in a lower than 5 numeric, it will shrink back to 5. Defaults to 0.90.

max_elites

Type: integer. The maximum allowed number of elite samples. Setting this value low increases the convergence speed, at the expense of exploration. It is not recommended to increase it over 5000 as it will slow down severely the next prior optimization. When elites have the same loss, the elite which was computed the earliest takes precedence over all others identical-loss elites (even if their parameters are different). Defaults to 150.

tested_elites

Type: integer. The number of elites tested at the same time when trying to find new values. A high value increases the space exploration at the expense of convergence speed. Minimum of 1 for small steps but fast convergence speed, supposing the initialization with good enough. Defaults to 5.

elites_converge

Type: integer. The number of elites to use to assess convergence via cThr and dThr. The larger the elites_converge, the tighter the convergence requirements. It cannot be higher than the number of tested_elites. Defaults to 10.

CEmax

Type: integer. The maximum alloted swarm for Cross-Entropy optimization of variables post-initialization. The higher the more accurate the potential convergence, but potentially lowers the the exploration space and increases heavily the computation time. Defaults to 200.

CEiter

Type: integer. The maximum alloted iterations for Cross-Entropy optimization of variables post-initialization. The higher the more accurate the potential convergence, but potentially lowers the exploration space and increases heavily the computation time. Defaults to 20.

CEelite

Type: numeric. The elite alloted for Cross-Entropy optimization of variables post-initialization. The lower the more accurate the potential convergence, but potentially lowers the exploration space and increases heavily the computation time. CEmax * CEelite defines the Cross-Entropy elite population, which preferably should be equal to 10 * number of variables for stable updating of the parameter updates. Defaults to 0.1.

CEimprove

Type: integer. The maximum number of iterations alloted for Cross-Entropy optimization of variables post-initialization. The higher the more accurate the potential convergence, but potentially lowers the exploration space and increases heavily the computation time. Defaults to 3.

CEexploration_cont

Type: numeric. The multiplication factor of noise for numeric data. Higher values increase the exploration space. Setting it to 0 forces a full convergence mode instead of exploring data. Must be greater than or equal to 0. Defaults to 2.

CEexploration_disc

Type: vector of two numerics. Respectively the inverse factor of the noise generator, and the multiplicator of noise for discrete data. Setting one of them to 0 nullifies the effect of noise, thus forcing a full convergence mode instead of exploring data. Defaults to c(2, 5)

CEexploration_decay

Type: numeric. The decay factor of noise for continuous and discrete data. Lower values mean faster decay (exp(N-1th batch * (1 - CEexploration_decay))). Must be between 0 (near instant decay) and 1 (no decay). Defaults to 0.98.

maximize

Type: boolean. Whether to maximize or not to maximize (minimize). Defaults to TRUE.

best

Type: numeric. The best value you can get from the loss you will accept to interrupt early the optimizer. Defaults to NULL.

cMean

Type: numeric vector. The mean of continuous variables to feed to f_train.

cSD

Type: numeric vector. The standard deviation of continuous variables to feed to f_train.

cOrdinal

Type: boolean vector. Whether each continuous variable is ordinal or not.

cMin

Type: numeric vector. The minimum of each continuous variable.

cMax

Type: numeric vector. The maximum of each continuous variable.

cThr

Type: numeric. The value at which if the maximum standard deviation of continuous variables of the elites is higher than cThr, the continuous variables are supposed having converged. Once converged, the algorithm will have only one try to generate a higher threshold while optimizing. If it fails, convergence interrupts the optimization. Applies also to the cross-entropy internal optimization. Defaults to 0.001, which means the continuous variables will be supposed converged once there is no more maximum standard deviation of 0.001.

dProb

Type: list of numeric vectors. A list containing for each discrete variable, a vector with the probability of the i-1-th element to appear.

dThr

Type: numeric. The value at which if the probability of the worst occurring discrete value in discrete variables of the elites is higher or equal to dThr, the discrete variables are supposed having converged. Once converged, the algorithm will have only one try to generate a higher threshold while optimizing. If it fails, convergence interrupts the optimization. Applies also to the cross-entropy internal optimization, but as 1 - dThr. Defaults to 1, which means the discrete variables will be supposed converged once all discrete variables have the same probability of 1.

priorsC

Type: matrix. The matrix of continuous priors. Even when filled, cMean and cSD are mandatory to be filled.

priorsD

Type: matrix. The matrix of discrete priors. Even when filled, dProb is mandatory to be filled.

errorCode

Type: vector of 2 numerics. When f_train is ill-conditioned or has an "error", you can use an error code to replace it by a dummy value which will be parsed afterwards for removal. You must adapt it to your own error code. For instance, the error code returned by f_train should be the errorCode value when no features are selected for training a supervised model. The error codes are removed from the priors. Defaults to -9999.

autoExpVar

Type: boolean. Whether the local priors must be exported to the global environment. This is extremely useful for debugging, but also to catch the priorsC and priorsD matrices when ExtraOpt, f_train, f_est, or f_prob is error-ing without a possible recovery. You would then be able to feed the priors and re-run without having to run again the algorithm from scratch. Defaults to FALSE. The saved variable in the global environment is called "temporary_Laurae".

autoExpFile

Type: character. Whether the local priors must be exported to as RDS files. This is extremely useful for debugging, but also to catch the priorsC and priorsD matrices when ExtraOpt, f_train, f_est, or f_prob is error-ing without a possible recovery. You would then be able to feed the priors and re-run without having to run again the algorithm from scratch. Defaults to NULL.

verbose

Type: integer. Should ExtraOpt become chatty and report a lot? A value of 0 defines silent, while 1 chats a little bit (and 2 chats a lot). 3 is so chatty it will flood severely. Defaults to 1.

plot

Type: function. Whether to call a function to plot data or not. Your plotting function should take as first argument "priors", which as a matrix with as first column the Loss, followed then by continuous variables, and ends with discrete variables. Continuous variables start with "C" while discrete variables start with "D" in the column names. Defaults to NULL.

debug

Type: boolean. Whether an interactive console should be used to run line by line for debugging purposes. Defaults to FALSE.

Value

A list with best for the best value found, variables for the variable values (split into continuous list and discrete list), priors for the list of iterations and their values, elite_priors for the laste elites used, new_priors for the last iterations issued from the elites, iterations for the number of iterations, and thresh_stats for the threshold statistics over batches.

Examples

Run this code

## Not run: ------------------------------------
# # Example of params:
# - 50 random initializations
# - 200 maximum tries
# - 3 continuous variables in [0, 10]
# --- with 2 continuous and 1 ordinal
# --- with respective means (2, 4, 6)
# --- and standard deviation (1, 2, 3)
# - and 2 discrete features
# - with respective prior probabilities {(0.8, 0.2), (0.7, 0.1, 0.2)}
# - and loss error code (illegal priors) of -9999
# 
# ExtraOpt(Ninit = 50,
#          nthreads = 1,
#          eta = 0.1,
#          early_stop = 10,
#          X_train,
#          X_test,
#          Y_train,
#          Y_test,
#          Nmax = 200,
#          cMean = c(2, 4, 6),
#          cSD = c(1, 2, 3),
#          cOrdinal = c(FALSE, FALSE, TRUE),
#          cMin = c(0, 0, 0),
#          cMax = c(10, 10, 10),
#          dProb = list(v1 = c(0.8, 0.2), v2 = c(0.7, 0.1, 0.2)),
#          priorsC = NULL,
#          priorsD = NULL,
#          autoExp = FALSE,
#          errorCode = -9999)
## ---------------------------------------------

Run the code above in your browser using DataLab