Carry out statistical inference on greta models by MCMC or likelihood/posterior optimisation.
stashed_samples()mcmc(model, method = c("hmc"), n_samples = 1000, thin = 1, warmup = 100,
verbose = TRUE, pb_update = 10, control = list(),
initial_values = NULL)
opt(model, method = c("adagrad"), max_iterations = 100, tolerance = 1e-06,
control = list(), initial_values = NULL)
greta_model object
the method used to sample or optimise values. Currently only
one method is available for each procedure: hmc
and adagrad
the number of MCMC samples to draw (after any warm-up, but before thinning)
the MCMC thinning rate; every thin
samples is retained,
the rest are discarded
the number of samples to spend warming up the mcmc sampler. During this phase the sampler moves toward the highest density area and tunes sampler hyperparameters.
whether to print progress information to the console
how regularly to update the progress bar (in iterations)
an optional named list of hyperparameters and options to control behaviour of the sampler or optimiser. See Details.
an optional named vector of initial values for the free parameters in the model. These will be used as the starting point for sampling/optimisation
the maximum number of iterations before giving up
the numerical tolerance for the solution, the optimiser stops when the (absolute) difference in the joint density between successive iterations drops below this level
mcmc
& stashed_samples
- an mcmc.list
object that can be analysed using
functions from the coda package. This will contain mcmc samples of the
greta arrays used to create model
.
opt
- a list containing the following named elements:
parthe best set of parameters found
valuethe log joint density of the model at the parameters par
iterationsthe number of iterations taken by the optimiser
convergencean integer code, 0 indicates successful completion, 1 indicates the iteration limit max_iterations had been reached
If the sampler is aborted before finishing, the samples collected so
far can be retrieved with stashed_samples()
. Only samples from the
sampling phase will be returned.
For mcmc()
if verbose = TRUE
, the progress bar shows
the number of iterations so far and the expected time to complete the phase
of model fitting (warmup or sampling). Updating the progress bar regularly
slows down sampling, by as much as 9 seconds per 1000 updates. So if you
want the sampler to run faster, you can change pb_update
to increase
the number of iterations between updates of the progress bar, or turn the
progress bar off altogether by setting verbose = FALSE
.
Occasionally, a proposed set of parameters can cause numerical instability
(I.e. the log density or its gradient is NA
, Inf
or
-Inf
); normally because the log joint density is so low that it
can't be represented as a floating point number. When this happens, the
progress bar will also display the proportion of samples so far that were
'bad' (numerically unstable) and therefore rejected.
If you're getting a lot of numerical instability, you might want to
manually define starting values to move the sampler into a more reasonable
part of the parameter space. Alternatively, you could redefine the model
(via model
) to have double precision, though this will slow down
sampling.
Currently, the only implemented MCMC procedure is static Hamiltonian
Monte Carlo (method = "hmc"
). During the warmup iterations, the
leapfrog stepsize hyperparameter epsilon
is tuned to maximise the
sampler efficiency. The control
argument can be used to specify the
initial value for epsilon, along with two other hyperparameters: Lmin
and Lmax
; positive integers (with Lmax > Lmin
) giving the
upper and lower limits to the number of leapfrog steps per iteration (from
which the number is selected uniformly at random).
The default control options for HMC are:
control = list(Lmin = 10, Lmax = 20, epsilon = 0.005)
Currently, the only implemented optimisation algorithm is Adagrad
(method = "adagrad"
). The control
argument can be used to
specify the optimiser hyperparameters: learning_rate
(default 0.8),
initial_accumulator_value
(default 0.1) and use_locking
(default TRUE
). The are passed directly to TensorFlow's optimisers,
see
the
TensorFlow docs for more information
# NOT RUN {
# define a simple model
mu = variable()
sigma = lognormal(1, 0.1)
x = rnorm(10)
distribution(x) = normal(mu, sigma)
m <- model(mu, sigma)
# carry out mcmc on the model
draws <- mcmc(m,
n_samples = 100,
warmup = 10)
# }
# NOT RUN {
# find the MAP estimate
opt_res <- opt(m)
# }
Run the code above in your browser using DataLab