inference: Statistical inference on greta models.

Description

Carry out statistical inference on greta models by MCMC or likelihood/posterior optimisation.

Usage

mcmc(
  model,
  sampler = hmc(),
  n_samples = 1000,
  thin = 1,
  warmup = 1000,
  chains = 4,
  n_cores = NULL,
  verbose = TRUE,
  pb_update = 50,
  one_by_one = FALSE,
  initial_values = initials(),
  trace_batch_size = 100
)
stashed_samples()
extra_samples(
  draws,
  n_samples = 1000,
  thin = 1,
  n_cores = NULL,
  verbose = TRUE,
  pb_update = 50,
  one_by_one = FALSE,
  trace_batch_size = 100
)
initials(...)
opt(
  model,
  optimiser = bfgs(),
  max_iterations = 100,
  tolerance = 1e-06,
  initial_values = initials(),
  adjust = TRUE,
  hessian = FALSE
)

Value

mcmc, stashed_samples & extra_samples - a greta_mcmc_list object that can be analysed using functions from the coda package. This will contain mcmc samples of the greta arrays used to create model.

opt - a list containing the following named elements:

par a named list of the optimal values for the greta arrays specified in model
value the (unadjusted) negative log joint density of the model at the parameters 'par'
iterations the number of iterations taken by the optimiser
convergence an integer code, 0 indicates successful completion, 1 indicates the iteration limit max_iterations had been reached
hessian (if hessian = TRUE) a named list of hessian matrices/arrays for the parameters (w.r.t. value)

Arguments

model: greta_model object
sampler: sampler used to draw values in MCMC. See samplers() for options.
n_samples: number of MCMC samples to draw per chain (after any warm-up, but before thinning)
thin: MCMC thinning rate; every thin samples is retained, the rest are discarded
warmup: number of samples to spend warming up the mcmc sampler (moving chains toward the highest density area and tuning sampler hyperparameters).
chains: number of MCMC chains to run
n_cores: the maximum number of CPU cores used by each sampler (see details).
verbose: whether to print progress information to the console
pb_update: how regularly to update the progress bar (in iterations). If pb_update is less than or equal to thin, it will be set to thin + 1 to ensure at least one saved iteration per pb_update iterations.
one_by_one: whether to run TensorFlow MCMC code one iteration at a time, so that greta can handle numerical errors as 'bad' proposals (see below).
initial_values: an optional initials object (or list of initials objects of length chains) giving initial values for some or all of the variables in the model. These will be used as the starting point for sampling/optimisation.
trace_batch_size: the number of posterior samples to process at a time when tracing the parameters of interest; reduce this to reduce memory demands
draws: a greta_mcmc_list object returned by mcmc or stashed_samples
...: named numeric values, giving initial values of some or all of the variables in the model (unnamed variables will be automatically initialised)
optimiser: an optimiser object giving the optimisation algorithm and parameters See optimisers().
max_iterations: the maximum number of iterations before giving up
tolerance: the numerical tolerance for the solution, the optimiser stops when the (absolute) difference in the joint density between successive iterations drops below this level
adjust: whether to account for Jacobian adjustments in the joint density. Set to FALSE (and do not use priors) for maximum likelihood estimates, or TRUE for maximum a posteriori estimates.
hessian: whether to return a list of analytically differentiated Hessian arrays for the parameters

Details

For mcmc() if verbose = TRUE, the progress bar shows the number of iterations so far and the expected time to complete the phase of model fitting (warmup or sampling). Occasionally, a proposed set of parameters can cause numerical instability (I.e. the log density or its gradient is NA, Inf or -Inf); normally because the log joint density is so low that it can't be represented as a floating point number. When this happens, the progress bar will also display the proportion of proposals so far that were 'bad' (numerically unstable) and therefore rejected. Some numerical instability during the warmup phase is normal, but 'bad' samples during the sampling phase can lead to bias in your posterior sample. If you only have a few bad samples (<10\%), you can usually resolve this with a longer warmup period or by manually defining starting values to move the sampler into a more reasonable part of the parameter space. If you have more samples than that, it may be that your model is misspecified. You can often diagnose this by using calculate() to evaluate the values of greta arrays, given fixed values of model parameters, and checking the results are what you expect.

greta runs multiple chains simultaneously with a single sampler, vectorising all operations across the chains. E.g. a scalar addition in your model is computed as an elementwise vector addition (with vectors having length chains), a vector addition is computed as a matrix addition etc. TensorFlow is able to parallelise these operations, and this approach reduced computational overheads, so this is the most efficient of computing on multiple chains.

Multiple mcmc samplers (each of which can simultaneously run multiple chains) can also be run in parallel by setting the execution plan with the future package. Only plan(multisession) futures or plan(cluster) futures that don't use fork clusters are allowed, since forked processes conflict with TensorFlow's parallelism. Explicitly parallelising chains on a local machine with plan(multisession) will probably be slower than running multiple chains simultaneously in a single sampler (with plan(sequential), the default) because of the overhead required to start new sessions. However, plan(cluster) can be used to run chains on a cluster of machines on a local or remote network. See future::cluster() for details, and the future.batchtools package to set up plans on clusters with job schedulers.

If n_cores = NULL and mcmc samplers are being run sequentially, each sampler will be allowed to use all CPU cores (possibly to compute multiple chains sequentially). If samplers are being run in parallel with the future package, n_cores will be set so that n_cores * [future::nbrOfWorkers] is less than the number of CPU cores.

After carrying out mcmc on all the model parameters, mcmc() calculates the values of (i.e. traces) the parameters of interest for each of these samples, similarly to calculate(). Multiple posterior samples can be traced simultaneously, though this can require large amounts of memory for large models. As in calculate, the argument trace_batch_size can be modified to trade-off speed against memory usage.

If the sampler is aborted before finishing (and future parallelism isn't being used), the samples collected so far can be retrieved with stashed_samples(). Only samples from the sampling phase will be returned.

Samples returned by mcmc() and stashed_samples() can be added to with extra_samples(). This continues the chain from the last value of the previous chain and uses the same sampler and model as was used to generate the previous samples. It is not possible to change the sampler or extend the warmup period.

Because opt() acts on a list of greta arrays with possibly varying dimension, the par and hessian objects returned by opt() are named lists, rather than a vector (par) and a matrix (hessian), as returned by stats::optim(). Because greta arrays may not be vectors, the Hessians may not be matrices, but could be higher-dimensional arrays. To return a Hessian matrix covering multiple model parameters, you can construct your model so that all those parameters are in a vector, then split the vector up to define the model. The parameter vector can then be passed to model. See example.

Examples

Run this code

if (FALSE) {
# define a simple Bayesian model
x <- rnorm(10)
mu <- normal(0, 5)
sigma <- lognormal(1, 0.1)
distribution(x) <- normal(mu, sigma)
m <- model(mu, sigma)

# carry out mcmc on the model
draws <- mcmc(m, n_samples = 100)

# add some more samples
draws <- extra_samples(draws, 200)

#' # initial values can be passed for some or all model variables
draws <- mcmc(m, chains = 1, initial_values = initials(mu = -1))

# if there are multiple chains, a list of initial values should be passed,
# othewise the same initial values will be used for all chains
inits <- list(initials(sigma = 0.5), initials(sigma = 1))
draws <- mcmc(m, chains = 2, initial_values = inits)

# you can auto-generate a list of initials with something like this:
inits <- replicate(4,
  initials(mu = rnorm(1), sigma = runif(1)),
  simplify = FALSE
)
draws <- mcmc(m, chains = 4, initial_values = inits)

# or find the MAP estimate
opt_res <- opt(m)

# get the MLE of the normal variance
mu <- variable()
variance <- variable(lower = 0)
distribution(x) <- normal(mu, sqrt(variance))
m2 <- model(variance)

# adjust = FALSE skips the jacobian adjustments used in MAP estimation, to
# give the true maximum likelihood estimates
o <- opt(m2, adjust = FALSE)

# the MLE corresponds to the *unadjusted* sample variance, but differs
# from the sample variance
o$par
mean((x - mean(x))^2) # same
var(x) # different

# initial values can also be passed to optimisers:
o <- opt(m2, initial_values = initials(variance = 1))

# and you can return a list of the Hessians for each of these parameters
o <- opt(m2, hessian = TRUE)
o$hessian


# to get a hessian matrix across multiple greta arrays, you must first
# combine them and then split them up for use in the model (so that the
# combined vector is part of the model) and pass that vector to model:
params <- c(variable(), variable(lower = 0))
mu <- params[1]
variance <- params[2]
distribution(x) <- normal(mu, sqrt(variance))
m3 <- model(params)
o <- opt(m3, hessian = TRUE)
o$hessian
}

Run the code above in your browser using DataLab