temper: Simulated Tempering and Umbrella Sampling

Description

Markov chain Monte Carlo (MCMC) for continuous random vectors using parallel or serial tempering, the latter also called umbrella sampling and simulated tempering. The chain simulates $k$ different distributions on the same state space. In parallel tempering, all the distributions are simulated in each iteration. In serial tempering, only one of the distributions is simulated (a random one). In parallel tempering, the state is a $k \times p$ matrix, each row of which is the state for one of the distributions. In serial tempering the state of the Markov chain is a pair $(i, x)$, where $i$ is an integer between 1 and $k$ and $x$ is a vector of length $p$. This pair is represented as a single real vector c(i, x). The variable $i$ says which distribution $x$ is a simulation for.

Usage

temper(obj, initial, neighbors, nbatch, blen = 1, nspac = 1, scale = 1,
    outfun, debug = FALSE, parallel = FALSE, ...)
# S3 method for function
temper(obj, initial, neighbors, nbatch,
    blen = 1, nspac = 1, scale = 1,
    outfun, debug = FALSE, parallel = FALSE, ...)
# S3 method for tempering
temper(obj, initial, neighbors, nbatch,
    blen = 1, nspac = 1, scale = 1,
    outfun, debug = FALSE, parallel = FALSE, ...)

Value

an object of class "mcmc", subclass "tempering", which is a list containing at least the following components:

batch: the batch means of the continuous part of the state. If outfun is missing, an nbatch by k by p array. Otherwise, an nbatch by m matrix, where m is the length of the result of outfun.
ibatch: (returned for serial tempering only) an nbatch by k matrix giving batch means for the multivariate Bernoulli random vector that is all zeros except for a 1 in the i-th place when the current state is $(i, x)$.
acceptx: fraction of Metropolis within-component proposals accepted. A vector of length k giving the acceptance rate for each component.
accepti: fraction of Metropolis jump/swap proposals accepted. A k by k matrix giving the acceptance rate for each allowed jump or swap component. NA for elements such that the corresponding elements of neighbors is FALSE.
initial: value of argument initial.
final: final state of Markov chain.
initial.seed: value of .Random.seed before the run.
final.seed: value of .Random.seed after the run.
time: running time of Markov chain from system.time().
lud: the function used to calculate log unnormalized density, either obj or obj$lud from a previous run.
nbatch: the argument nbatch or obj$nbatch.
blen: the argument blen or obj$blen.
nspac: the argument nspac or obj$nspac.
outfun: the argument outfun or obj$outfun.

Description of additional output when debug = TRUE can be found in the vignette debug, which is shown by

vignette("debug", "mcmc").

Arguments

obj

either an R function or an object of class "tempering" from a previous run.

If a function, it should evaluate the log unnormalized density $\log h(i, x)$ of the desired equilibrium distribution of the Markov chain for serial tempering (the same function is used for both serial and parallel tempering, see Details below for further explanation).

If an object of class "tempering", the log unnormalized density function is obj$lud, and missing arguments of temper are obtained from the corresponding elements of obj.

The first argument of the log unnormalized density function is the is an R vector c(i, x), where i says which distribution x is supposed to be a simulation for. Other arguments are arbitrary and taken from the ... arguments of temper. The log unnormalized density function should return -Inf in regions of the state space having probability zero.

initial

for serial tempering, a real vector c(i, x) as described above. For parallel tempering, a real $k \times p$ matrix as described above. In either case, the initial state of the Markov chain. Ignored if obj has class "tempering".

neighbors

a logical symmetric matrix of dimension k by k. Elements that are TRUE indicate jumps or swaps attempted by the Markov chain. Ignored if obj has class "tempering".

nbatch

the number of batches.

blen

the length of batches.

nspac

the spacing of iterations that contribute to batches.

scale

controls the proposal step size for real elements of the state vector. For serial tempering, proposing a new value for the $x$ part of the state $(i, x)$. For parallel tempering, proposing a new value for the $x_i$ part of the state $(x_1, \ldots, x_k)$. In either case, the proposal is a real vector of length $p$. If scalar or vector, the proposal is x + scale * z where x is the part $x$ or $x_i$ of the state the proposal may replace. If matrix, the proposal is x + scale %*% z. If list, the length must be k, and each element must be scalar, vector, or matrix, and operate as described above. The $i$-th component of the list is used to update $x$ when the state is $(i, x)$ or $x_i$ otherwise.

outfun

controls the output. If a function, then the batch means of outfun(state, ...) are returned. The argument state is like the argument initial of this function. If missing, the batch means of the real part of the state vector or matrix are returned, and for serial tempering the batch means of a multivariate Bernoulli indicating the current component are returned.

debug

if TRUE extra output useful for testing.

parallel

if TRUE does parallel tempering, if FALSE does serial tempering. Ignored if obj has class "tempering".

...

additional arguments for obj or outfun.

Warning

If outfun is missing, then the log unnormalized density function can be defined without a ... argument and that works fine. One can define it starting ludfun <- function(state) and that works or ludfun <- function(state, foo, bar), where foo and bar are supplied as additional arguments to temper and that works too.

If outfun is a function, then both it and the log unnormalized density function can be defined without ... arguments if they have exactly the same arguments list and that works fine. Otherwise it doesn't work. Define these functions by


ludfun <- function(state, foo)
outfun <- function(state, bar)

and you get an error about unused arguments. Instead define these functions by


ludfun <- function(state, foo, \ldots)
outfun <- function(state, bar, \ldots)

and supply foo and bar as additional arguments to temper, and that works fine.

In short, the log unnormalized density function and outfun need to have ... in their arguments list to be safe. Sometimes it works when ... is left out and sometimes it doesn't.

Of course, one can avoid this whole issue by always defining the log unnormalized density function and outfun to have only one argument state and use global variables (objects in the R global environment) to specify any other information these functions need to use. That too follows the R way. But some people consider that bad programming practice.

A third option is to define either or both of these functions using a function factory. This is demonstrated in the vignette for this package named demo, which is shown by vignette("demo", "mcmc").

Philosophy of MCMC

This function follows the philosophy of MCMC explained the introductory chapter of the Handbook of Markov Chain Monte Carlo (Geyer, 2011a) and in the chapter of that book on tempering and related subjects (Geyer, 2011b). See also the section on philosophy of metrop.

Tuning

The scale argument must be adjusted so that the acceptance rate for within-component proposals (component acceptx of the result returned by this function) is not too low or too high to get reasonable performance. The log unnormalized density function must be chosen so that the acceptance rate for jump/swap proposals (component accepti of the result returned by this function) is not too low or too high to get reasonable performance. The former is a vector and the latter is a matrix, and all these rates must be adjusted to be reasonable.

The rates in in accepti are influenced by the number of components of the tempering mixture distribution, by what those components are (how far apart they are in some unspecified metric on probability distributions), and by the chosen normalizing constants for those distributions.

For examples of tuning tempering, see Geyer (2011b) and also the vignette of this package shown by vignette("bfst", "mcmc"). The help for R function metrop also gives advice on tuning its sampler, which is relevant for tuning the acceptx rates.

See also Geyer (1991) and Geyer and Thompson (1995) for the general theory of tuning parallel and serial tempering.

Details

Serial tempering simulates a mixture of distributions of a continuous random vector. The number of components of the mixture is k, and the dimension of the random vector is p. Denote the state $(i, x)$, where $i$ is a positive integer between 1 and $k$, and let $h(i, x)$ denote the unnormalized joint density of their equilibrium distribution. The logarithm of this function is what obj or obj$lud calculates. The mixture distribution is the marginal for $x$ derived from the equilibrium distribution $h(i, x)$, that is, $$h(x) = \sum_{i = 1}^k h(i, x)$$

Parallel tempering simulates a product of distributions of a continuous random vector. Denote the state $(x_1, \ldots, x_k)$, then the unnormalized joint density of the equilibrium distribution is $$h(x_1, \ldots, x_k) = \prod_{i = 1}^k h(i, x_i)$$

The update mechanism of the Markov chain combines two kinds of elementary updates: jump/swap updates (jump for serial tempering, swap for parallel tempering) and within-component updates. Each iteration of the Markov chain one of these elementary updates is done. With probability 1/2 a jump/swap update is done, and with probability 1/2 a with-component update is done.

Within-component updates are the same for both serial and parallel tempering. They are “random-walk” Metropolis updates with multivariate normal proposal, the proposal distribution being determined by the argument scale. In serial tempering, the $x$ part of the current state $(i, x)$ is updated preserving $h(i, x)$. In parallel tempering, an index $i$ is chosen at random and the part of the state $x_i$ representing that component is updated, again preserving $h(i, x)$.

Jump updates choose uniformly at random a neighbor of the current component: if $i$ indexes the current component, then it chooses uniformly at random a $j$ such that neighbors[i, j] == TRUE. It then does does a Metropolis-Hastings update for changing the current state from $(i, x)$ to $(j, x)$.

Swap updates choose a component uniformly at random and a neighbor of that component uniformly at random: first an index $i$ is chosen uniformly at random between 1 and $k$, then an index $j$ is chosen uniformly at random such that neighbors[i, j] == TRUE. It then does does a Metropolis-Hastings update for swapping the states of the two components: interchanging $x_i$ and $x_j$ while preserving $h(x_1, \ldots, x_k)$.

The initial state must satisfy lud(initial, ...) > - Inf for serial tempering or must satisfy lud(initial[i, ], ...) > - Inf for each i for parallel tempering, where lud is either obj or obj$lud. That is, the initial state must have positive probability.

References

Geyer, C. J. (1991) Markov chain Monte Carlo maximum likelihood. Computing Science and Statistics: Proc. 23rd Symp. Interface, 156--163. http://hdl.handle.net/11299/58440.

Geyer, C. J. (2011a) Introduction to MCMC. In Handbook of Markov Chain Monte Carlo. Edited by S. P. Brooks, A. E. Gelman, G. L. Jones, and X. L. Meng. Chapman & Hall/CRC, Boca Raton, FL, pp. 3--48.

Geyer, C. J. (2011b) Importance Sampling, Simulated Tempering, and Umbrella Sampling. In Handbook of Markov Chain Monte Carlo. Edited by S. P. Brooks, A. E. Gelman, G. L. Jones, and X. L. Meng. Chapman & Hall/CRC, Boca Raton, FL, pp. 295--312.

Geyer, C. J. and Thompson, E. A. (1995) Annealing Markov chain Monte Carlo with applications to ancestral inference. Journal of the American Statistical Association 90 909--920.

Examples

Run this code

d <- 9
witch.which <- c(0.1, 0.3, 0.5, 0.7, 1.0)
ncomp <- length(witch.which)

neighbors <- matrix(FALSE, ncomp, ncomp)
neighbors[row(neighbors) == col(neighbors) + 1] <- TRUE
neighbors[row(neighbors) == col(neighbors) - 1] <- TRUE

ludfun <- function(state, log.pseudo.prior = rep(0, ncomp)) {
    stopifnot(is.numeric(state))
    stopifnot(length(state) == d + 1)
    icomp <- state[1]
    stopifnot(icomp == as.integer(icomp))
    stopifnot(1 <= icomp && icomp <= ncomp)
    stopifnot(is.numeric(log.pseudo.prior))
    stopifnot(length(log.pseudo.prior) == ncomp)
    theta <- state[-1]
    if (any(theta > 1.0)) return(-Inf)
    bnd <- witch.which[icomp]
    lpp <- log.pseudo.prior[icomp]
    if (any(theta > bnd)) return(lpp)
    return(- d * log(bnd) + lpp)
}

# parallel tempering
thetas <- matrix(0.5, ncomp, d)
out <- temper(ludfun, initial = thetas, neighbors = neighbors, nbatch = 20,
    blen = 10, nspac = 5, scale = 0.56789, parallel = TRUE, debug = TRUE)

# serial tempering
theta.initial <- c(1, rep(0.5, d))
# log pseudo prior found by trial and error
qux <- c(0, 9.179, 13.73, 16.71, 20.56)

out <- temper(ludfun, initial = theta.initial, neighbors = neighbors,
    nbatch = 50, blen = 30, nspac = 2, scale = 0.56789,
    parallel = FALSE, debug = FALSE, log.pseudo.prior = qux)

Run the code above in your browser using DataLab