run.jags.study: Drop-k and simulated dataset studies using JAGS

Description

These functions can be used to fit a user specified JAGS model to multiple datasets with automatic control of run length and convergence, over a distributed computing cluster such as that provided by snow. The results for monitored variables are compared to the target values provided and a summary of the model performance is returned. This may be used to facilitate model validation using simulated data, or to assess model fit using a 'drop-k' type cross validation study where one or more data points are removed in turn and the model's ability to predict that datapoint is assessed.

Usage

drop.k(runjags.object, dropvars, k = 1, simulations = NA, ...)
drop.k.jags(runjags.object, dropvars, k = 1, simulations = NA, ...)
drop.k.JAGS(runjags.object, dropvars, k = 1, simulations = NA, ...)
run.jags.study(
  simulations,
  model,
  datafunction,
  targets = list(),
  confidence = 0.95,
  record.chains = FALSE,
  max.time = "15m",
  silent.jags = TRUE,
  parallel.method = parLapply,
  n.cores = NA,
  export.cluster = character(0),
  inits = list(),
  ...
)

Value

An object of class runjagsstudy-class, containing a summary of the performance of the model with regards to the target variables specified. If record.chains=TRUE, an element named 'runjags' containing a list of all the runjags objects returned will also be present. Any error messages given by individual simulations will be contained in the 'errors' element of the returned list.

Arguments

runjags.object: an object of class runjagsstudy-class on which to perform the drop-k analysis
dropvars: the variable(s) to be eliminated from the data so that the ability of the model to predict these datapoints can be assessed. The variable can be specified as a vector, or as a single character for which partial matching will be done. Array indices can be used, but must be specified as a complete range e.g. variable[2:5,2] is permitted, but variable[,2] is not because the first index is empty
k: the number of datapoints to be dropped from each individual simulation. The default of 1 is a drop-1 study (also called a leave-one-out cross validation study).
simulations: the number of datasets to run the model on. For drop.k the default is to use the number of unique datapoints, resulting in a drop-1 study. If the specified number of simulations is different to the number of unique datapoints, the datapoints are dropped randomly between simulations.
...: optional arguments to be passed to autorun.jags, or to the parallel method function (such as 'cl').
model: the JAGS model to use, in the same format as would be specified to run.jags.
datafunction: a function that will be used to specify the data. This must take either zero arguments, or one argument representing the simulation number, and return either a named list or character vector in the R dump format containing the data specific to that simulation. It is possible to specify any data that does not change for each simulation using a #data# \<variable\> tag in the model code.
targets: a named list of variables (which can include vectors/arrays) with values to which the model outputs are compared (if stochastic). The target variable names are also automatically included as monitored variables.
confidence: a probability (or vector of probabilities) to use when calculating the proportion of credible intervals containing the true target value. Default 95% CI.
record.chains: option to return the full runjags objects returned from each simulation as a list item named 'runjags'.
max.time: the maximum time for which each individual simulation is allowed to run by the underling autorun.jags function. Acceptable units include 'seconds', 'minutes', 'hours', 'days', 'weeks', or the first letter(s) of each. Default is 15 minutes.
silent.jags: option to suppress all JAGS output, even for simulations run locally. If set to FALSE, there is no guarantee that the output will be displayed in sequential order between the parallel simulations. Default TRUE.
parallel.method: a function that will be used to call the repeated simulations. This must take the first two arguments 'X' and 'FUN' as for lapply, with other optional arguments passed through from the parent function call. Default uses parLapply, but lapply or mclapply could also be used.
n.cores: the maximum number of cores to use for parallel simulations. Default value uses detectCores, or a minumum of 2. Ignored if cl is supplied, or if parallel.method does not take a cl argument.
export.cluster: a character vector naming objects to be retrieved from the parent frame of the function call and made available to the cluster nodes. This may be useful if the initial values specified for the model are required to be extracted from the working environment, however it may be preferable to specify a function for inits instead.
inits: as for run.jags, except that it is not permitted to be an environment. It is recommended to a function to return appropriate initial values (which may depend on the data visible when the function is evaluated).

Details

The drop.k function is a wrapper to run.jags.study for the common application of drop-k cross validation studies on fitted JAGS models. The run.jags.study function is more flexible, and can be used for validating the performance of a model against simulated data with known parameters. For the latter, a user-specified function to generate suitable datasets to analyse is required.

Examples

Run this code

# For examples of usage see the following vignette:
if (FALSE) {
vignette('userguide', package='runjags')
}