
Functions to set up optimisers (which find parameters that
maximise the joint density of a model) and change their tuning parameters,
for use in opt()
. For details of the algorithms and how to
tune them, see the
SciPy optimiser docs or the
TensorFlow optimiser docs.
nelder_mead()powell()
cg()
bfgs()
newton_cg()
l_bfgs_b(maxcor = 10, maxls = 20)
tnc(max_cg_it = -1, stepmx = 0, rescale = -1)
cobyla(rhobeg = 1)
slsqp()
gradient_descent(learning_rate = 0.01)
adadelta(learning_rate = 0.001, rho = 1, epsilon = 1e-08)
adagrad(learning_rate = 0.8, initial_accumulator_value = 0.1)
adagrad_da(
learning_rate = 0.8,
global_step = 1L,
initial_gradient_squared_accumulator_value = 0.1,
l1_regularization_strength = 0,
l2_regularization_strength = 0
)
momentum(learning_rate = 0.001, momentum = 0.9, use_nesterov = TRUE)
adam(learning_rate = 0.1, beta1 = 0.9, beta2 = 0.999, epsilon = 1e-08)
ftrl(
learning_rate = 1,
learning_rate_power = -0.5,
initial_accumulator_value = 0.1,
l1_regularization_strength = 0,
l2_regularization_strength = 0
)
proximal_gradient_descent(
learning_rate = 0.01,
l1_regularization_strength = 0,
l2_regularization_strength = 0
)
proximal_adagrad(
learning_rate = 1,
initial_accumulator_value = 0.1,
l1_regularization_strength = 0,
l2_regularization_strength = 0
)
rms_prop(learning_rate = 0.1, decay = 0.9, momentum = 0, epsilon = 1e-10)
an optimiser
object that can be passed to opt()
.
maximum number of 'variable metric corrections' used to define the approximation to the hessian matrix
maximum number of line search steps per iteration
maximum number of hessian * vector evaluations per iteration
maximum step for the line search
log10 scaling factor used to trigger rescaling of objective
reasonable initial changes to the variables
the size of steps (in parameter space) towards the optimal value
the decay rate
a small constant used to condition gradient updates
initial value of the 'accumulator' used to tune the algorithm
the current training step number
initial value of the accumulators used to tune the algorithm
L1 regularisation coefficient (must be 0 or greater)
L2 regularisation coefficient (must be 0 or greater)
the momentum of the algorithm
whether to use Nesterov momentum
exponential decay rate for the 1st moment estimates
exponential decay rate for the 2nd moment estimates
power on the learning rate, must be 0 or less
discounting factor for the gradient
The optimisers powell()
, cg()
, newton_cg()
,
l_bfgs_b()
, tnc()
, cobyla()
, and slsqp()
are
deprecated. They will be removed in greta 0.4.0, since they will no longer
be available in TensorFlow 2.0, on which that version of greta will depend.
The cobyla()
does not provide information about the number of
iterations nor convergence, so these elements of the output are set to NA
if (FALSE) {
# use optimisation to find the mean and sd of some data
x <- rnorm(100, -2, 1.2)
mu <- variable()
sd <- variable(lower = 0)
distribution(x) <- normal(mu, sd)
m <- model(mu, sd)
# configure optimisers & parameters via 'optimiser' argument to opt
opt_res <- opt(m, optimiser = bfgs())
# compare results with the analytic solution
opt_res$par
c(mean(x), sd(x))
}
Run the code above in your browser using DataLab