autorun.jags: Run or extend a user-specified Bayesian MCMC model in JAGS with automatically calculated run-length and convergence diagnostics

Description

Runs or extends a user specified JAGS model from within R, returning an object of class runjags-class. The model is automatically assessed for convergence and adequate sample size before being returned.

Usage

autorun.jags(
  model,
  monitor = NA,
  data = NA,
  n.chains = NA,
  inits = NA,
  startburnin = 4000,
  startsample = 10000,
  adapt = 1000,
  datalist = NA,
  initlist = NA,
  jags = runjags.getOption("jagspath"),
  silent.jags = runjags.getOption("silent.jags"),
  modules = runjags.getOption("modules"),
  factories = runjags.getOption("factories"),
  summarise = TRUE,
  mutate = NA,
  thin = 1,
  thin.sample = FALSE,
  raftery.options = list(),
  crash.retry = 1,
  interactive = FALSE,
  max.time = Inf,
  tempdir = runjags.getOption("tempdir"),
  jags.refresh = 0.1,
  batch.jags = silent.jags,
  method = runjags.getOption("method"),
  method.options = list(),
  ...
)
autoextend.jags(
  runjags.object,
  add.monitor = character(0),
  drop.monitor = character(0),
  drop.chain = numeric(0),
  combine = length(c(add.monitor, drop.monitor, drop.chain)) == 0,
  startburnin = 0,
  startsample = 10000,
  adapt = 1000,
  jags = NA,
  silent.jags = NA,
  summarise = TRUE,
  thin = NA,
  thin.sample = FALSE,
  raftery.options = list(),
  crash.retry = 1,
  interactive = FALSE,
  max.time = Inf,
  tempdir = runjags.getOption("tempdir"),
  jags.refresh = NA,
  batch.jags = NA,
  method = NA,
  method.options = NA,
  ...
)

Value

An object of class 'runjags' (see runjags-class for available methods).

Arguments

model: either a relative or absolute path to a textfile (including the file extension) containing a model in the JAGS language and possibly monitored variable names, data and/or initial values, or a character string of the same. No default. See read.jagsfile for more details.
monitor: a character vector of the names of variables to monitor. No default. The special node names 'deviance', 'pd', 'popt', 'dic', 'ped' and 'full.pd' are used to monitor the deviance, mean pD, mean pOpt, DIC, PED and full distribution of sum(pD) respectively. Note that these monitored nodes (with the exception of 'deviance') require multiple chains within the same simulation, and won't appear as variables in the summary statistics or plots (but see extract for a way of extracting these from the returned object).
data: a named list, data frame, environment, character string in the R dump format (see dump.format), or a function (with no arguments) returning one of these types. If the model text contains inline #data# comments, then this argument specifies the list, data frame or environment in which to search first for these variables (the global environment is always searched last). If the model text does not contain #data# comments, then the full list or data frame (but not environment) is included as data. If the data argument is a character string, then any #data# comments in the model are ignored (with a warning). The default specifies the parent environment of the function call.
n.chains: the number of chains to use with the simulation. More chains will improve the sensitivity of the convergence diagnostic, but will cause the simulation to run more slowly (although this may be improved by using a method such as 'parallel', 'rjparallel' or 'snow'). The minimum (and default) number of chains is 2.
inits: either a character vector with length equal to the number of chains the model will be run using, or a list of named lists representing names and corresponding values of inits for each chain, or a function with either 1 argument representing the chain or no arguments. If a vector, each element of the vector must be a character string in the R dump format representing the initial values for that chain, or NA. If not all initialising variables are specified, the unspecified variables are taken deterministically from the mean or mode of the prior distribution by JAGS. Values left as NA result in all initial values for that chain being taken from the prior distribution. The special variables '.RNG.seed', '.RNG.name', and '.RNG.state' are allowed for explicit control over random number generators in JAGS. If a function is provided, the data is available inside the function as a named list 'data' - this may be useful for setting initial values that depend on the data. Default NA.
startburnin: the number of burnin iterations, NOT including the adaptive iterations to use for the initial pilot run of the chains.
startsample: the total number of samples (including the chains supplied in runjags.object for autoextend.jags) on which to assess convergence, with a minimum of 4000. If the runjags.object already contains this number of samples then convergence will be assessed on this object, otherwise the required number of additional samples will be obtained before combining the chains with the old chains. More samples will give a better chance of allowing the chain to converge, but will take longer to achieve. Default 10000 iterations.
adapt: the number of adaptive iterations to use at the start of each simulation. For the rjags method this adaptation is only performed once and the model remains compiled, unless the repeatable.methods option is activated in runjags.options. For all other methods adaptation is done every time the simulation is extended. Default 1000 iterations.
datalist: deprecated argument.
initlist: deprecated argument.
jags: the system call or path for activating JAGS. Default uses the option given in runjags.options.
silent.jags: option to suppress output of the JAGS simulations. Default uses the option given in runjags.options.
modules: a character vector of external modules to be loaded into JAGS, either as the module name on its own or as the module name and status separated by a space, for example 'glm on'.
factories: a character vector of factory modules to be loaded into JAGS. Factories should be provided in the format '\<facname\> \<factype\> \<status\>' (where status is optional), for example: factories='mix::TemperedMix sampler on'. You must also ensure that any required modules are also specified (in this case 'mix').
summarise: should summary statistics be automatically calculated for the output chains? Default TRUE (but see also ?runjags.options -> force.summary).
mutate: either a function or a list with first element a function and remaining elements arguments to this function. This can be used to add new variables to the posterior chains that are derived from the directly monitored variables in JAGS. This allows the variables to be summarised or extracted as part of the MCMC objects as if they had been calculated in JAGS, but without the computational or storage overheads associated with calculating them in JAGS directly. The plot, summary and as.mcmc methods for runjags objects will automatically extract the mutated variables along with the directly monitored variables. For an application to pairwise comparisons of different levels within fixed effects see contrasts.mcmc.
thin: the thinning interval to be used in JAGS. Increasing the thinning interval may reduce autocorrelation, and therefore reduce the number of samples required, but will increase the time required to run the simulation. Using this option thinning is performed directly in JAGS, rather than on an existing MCMC object as with thin.sample. Default 1.
thin.sample: option to thin the final MCMC chain(s) before calculating summary statistics and returning the chains. Thinning very long chains reduces the size of the returned object. If TRUE, the chain is thinned to as close to a minimum of startsample iterations as possible to ensure the chain length matches thin.sample. A positive integer can also be specified as the desired chain length after thinning; the chains will be thinned to as close to this minimum value as possible. Default TRUE (thinned chains of length startsample returned). This option does NOT carry out thinning in JAGS, therefore R must have enough available memory to hold the chains BEFORE thinning. To avoid this problem use the 'thin' option instead.
raftery.options: a named list which is passed as additional arguments to raftery.diag, or the logical FALSE to deactivate automatic run length calculation. Default none (default arguments to raftery.diag are used).
crash.retry: the number of times to re-attempt a simulation if the model returns an error. Default 1 retry (simulation will be aborted after the second crash).
interactive: option to allow the simulation to be interactive, in which case the user is asked if the simulation should be extended when run length and convergence calculations are performed and the extended simulation will take more than 1 minute. The function will wait for a response before extending the simulations. If FALSE, the simulation will be run until the chains have converged or until the next extension would extend the simulation beyond 'max.time'. Default FALSE.
max.time: the maximum time for which the function is allowed to extend the chains to improve convergence, as a character string including units or as an integer in which case units are taken as seconds. Ignored if interactive=TRUE. If the function thinks that the next simulation extension to improve convergence will result in a total time of greater than max.time, the extension is aborted. The time per iteration is estimated from the first simulation. Acceptable units include 'seconds', 'minutes', 'hours', 'days', 'weeks', or the first letter(s) of each.
tempdir: option to use the temporary directory as specified by the system rather than creating files in the working directory. Any files created in the temporary directory are removed when the function exits for any reason. Default TRUE.
jags.refresh: the refresh interval (in seconds) for monitoring JAGS output using the 'interactive' and 'parallel' methods (see the 'method' argument). Longer refresh intervals will use slightly less processor time, but will make the simulation updates to be shown on the screen less frequently. Reducing the refresh rate to every 10 or 30 seconds may be worthwhile for simulations taking several days to run. Note that this will have no effect on the processor use of the simulations themselves. Default 0.1 seconds.
batch.jags: option to call JAGS in batch mode, rather than using input redirection. On JAGS >= 3.0.0, this suppresses output of the status which may be useful in some situations. Default TRUE if silent.jags is TRUE, or FALSE otherwise.
method: the method with which to call JAGS; probably a character vector specifying one of 'rjags', 'simple', 'interruptible', 'parallel', 'rjparallel', or 'snow'. The 'rjags' and 'rjparallel' methods run JAGS using the rjags package, whereas other options do not require the rjags package and call JAGS as an external executable. The advantage of the 'rjags' method is that the model will not need to be recompiled between successive calls to extend.jags, all other methods require a re-compilation (and adaptation if necessary) every time the model is extended. Note that the 'rjparallel' and 'snow' methods may leave behind zombie JAGS processes if the user interrupts the R session used to start the simulations - for this reason the 'parallel' method is recommended for interactive use with parallel chains. The 'parallel' and 'interruptible' methods for Windows require XP Professional, Vista or later (or any Unix-alike). For more information refer to the userguide vignette.
method.options: a deprecated argument currently permitted for backwards compatibility, but this will be removed from a future version of runjags. Pass these arguments directly to autorun.jags or autoextend.jags.
...: summary parameters to be passed to add.summary, and/or additional options to control some methods including n.sims for parallel methods, cl for rjparallel and snow methods, remote.jags for snow, and by and progress.bar for the rjags method.
runjags.object: the model to be extended - the output of a run.jags (or autorun.jags or extend.jags etc) function, with class 'runjags'. No default.
add.monitor: a character vector of variables to add to the monitored variable list. All previously monitored variables are automatically included - although see the 'drop.monitor' argument. Default no additional monitors.
drop.monitor: a character vector of previously monitored variables to remove from the monitored variable list for the extended model. Default none.
drop.chain: a numeric vector of chains to remove from the extended model. Default none.
combine: a logical flag indicating if results from the new JAGS run should be combined with the previous chains. Default TRUE if not adding or removing variables or chains, and FALSE otherwise.

Details

The autorun.jags function reads, compiles, and updates a JAGS model based on a model representation (plus data, monitors and initial values) input by the user. The autoextend.jags function takes an existing runjags-class object and extends the simulation as required. Chain convergence over the first run of the simulation is assessed using Gelman and Rubin's convergence diagnostic. If necessary, the simulation is extended to improve chain convergence (up to a user-specified maximum time limit), before the required sample size of the Markov chain is calculated using Raftery and Lewis's diagnostic. The simulation is extended to the required sample size dependant on autocorrelation and the number of chains. Note that automated convergence diagnostics are not perfect, and should not be considered as a replacement for manually assessing convergence and Monte Carlo error using the results returned. For more complex models, the use of run.jags directly with manual assessment of necessary run length may be preferable.

For autoextend.jags, any arguments with a default of NA are taken from the runjags object passed to the function.

References

Matthew J. Denwood (2016). runjags: An R Package Providing Interface Utilities, Model Templates, Parallel Computing Methods and Additional Distributions for MCMC Models in JAGS. Journal of Statistical Software, 71(9), 1-25. doi:10.18637/jss.v071.i09

Examples

Run this code

# Run a model to calculate the intercept and slope of the expression
# y = m x + c, assuming normal observation errors for y:

# Simulate the data
N <- 100
X <- 1:N
Y <- rnorm(N, 2*X + 10, 1)

# Model in the JAGS format
model <- "model {
for(i in 1 : N){
	Y[i] ~ dnorm(true.y[i], precision)
	true.y[i] <- m * X[i] + c
}
m ~ dunif(-1000,1000)
c ~ dunif(-1000,1000)
precision ~ dexp(1)

#data# N, X, Y
#inits# m, c, precision
}"

# Initial values to be used:
m <- list(-10, 10)
c <- list(-10, 10)
precision <- list(0.1, 10)
if (FALSE) {
# Run the model using rjags with a 5 minute timeout:
results <- autorun.jags(model=model, max.time="5m",
monitor=c("m", "c", "precision"), n.chains=2,
method="rjags")

# Analyse standard plots of the results to assess convergence:
plot(results)

# Summary of the monitored variables:
results

# For more details about possible methods see:
vignette('userguide', package='runjags')
}

Run the code above in your browser using DataLab