binom.krige.bayes: Bayesian Posterior Simulation and Prediction for the Binomial Spatial model

Description

This function performs posterior simulation (by MCMC) and spatial prediction in the binomial-logit spatial model.

Usage

binom.krige.bayes(geodata, coords = geodata$coords, data = geodata$data, 
         units.m = "default", locations = "no", borders,
         model, prior, mcmc.input, output)

Arguments

geodata

a list containing elements coords and data as described next. Typically an object of the class "geodata" - a geoR data set. If not provided the arguments coords and data must be given instead. The list may also contain an argument units.m as described below.

coords

an $n \times 2$ matrix, each row containing Euclidean coordinates of the n data locations. By default it takes the element coords of the argument geodata.

data

a vector with data values. By default it takes the element data of the argument geodata.

units.m

$n$-dimensional vector giving the number of trials for the data. By default (units.m = "default"), it takes geodata$units.m in case this exist and else a vector of 1's.

locations

an $N \times 2$ matrix or data frame, or a list with the 2-D coordinates of the $N$ prediction locations.

borders

optional. If a two column matrix defining a polygon is provided the prediction is performed only at locations inside this polygon.

model

a list defining the components of the model. It can take an output from model.glm.control or a list with elements as for the arguments in model.glm.control. See documentation for model.glm.control. All arguments have default values

prior

specification of priors for the model parameters. It can take an output from prior.glm.control or a list with elements as for the arguments in prior.glm.control. See documentation for prior.glm.control. ATTENTION: When phi.prior = "fixed" then phi must be specified, and when phi.prior is not "fixed" then phi.discrete must be specified. All other parameters have default values.

mcmc.input

input parameter for the MCMC algorithm. It can take an output from mcmc.control or a list with elements as for the arguments in mcmc.control. See documentation for mcmc.control. ATTENTION: the argument S.scale must be specified, the argument phi.start must specified when prior$phi is not "fixed", while all the others have default values.

output

parameters for controlling the output. It can take an output from output.glm.control or a list with elements as for the arguments in output.glm.control. See documentation for output.glm.control.

Value

A list with the following components:

posterior

A list with results for the posterior distribution of the model parameters and the random effects at the data locations. The components are:

betasummary of posterior distribution for the parameter $\beta$.
sigmasqsummary of the posterior distribution for the parameter $\sigma^2$.
phisummary of the posterior distribution of the parameter $\phi$.
simulationssample from the posterior distribution of $\exp(S)/(1+\exp(S))$ at the data locations. Returned only if keep.mcmc.sim = TRUE.
acc.rateThe acceptance rates.

predictive

A list with results for the predictive distribution at the prediction locations (if provided). The components are:

simulationsa numerical matrix. Each column contains a simulation from the predictive distribution. Returned only if sim.predict = TRUE.
mediana vector with the estimated median at the prediction locations.
uncertaintya vector with the estimated uncertainty at the prediction locations, defined as the length of the $95\%$ prediction interval divided by 4.
quantileA matrix or vector with quantile estimators.
probabilityA matrix or vector with probabilities below a threshold. Returned only if the argument threshold is used.

model

model components used as defined by model.glm.control.

prior

priors used for the model parameters.

mcmc.input

input parameters used for the MCMC algorithm.

.Random.seed

system random seed before running the function. Allows reproduction of results. If the .Random.seed is set to this value and the function is run again, it will produce exactly the same results.

call

the function call.

Details

binom.krige.bayes is a function for Bayesian geostatistical inference in the binomial-logit spatial model.

The Bayesian algorithm is using a discretized version of the prior distribution for the parameter $\phi$. This means that the prior for $\phi$ is always a proper prior.

For simulating from the posterior distribution of $S$ given $y$, we use a Langevin-Hastings type algorithm. This algorithm is a Metropolis-Hastings algorithm, where the proposal distribution uses gradient information from the posterior. The algorithm is described below. For shortness of presentation, we only present the MCMC algorithm for the case where $\beta$ follows a uniform prior.

When $\beta$ follows a uniform prior and the prior for $\sigma^2$ is a scaled inverse-$\chi^2$ distribution, the marginalised improper density of $S$ is $$ f_I(s) \propto |D^T V^{-1}D|^{-1/2}|V|^{-1/2}\{n_{\sigma}S^2_{\sigma} + s^T (V^{-1}-V^{-1}D(D^T V^{-1}D)^{-1}D^T V^{-1})s \}^{-(n-p+n_{\sigma})/2}, $$ where $V$ is the correlation matrix of $S$. The uniform prior for $\sigma^2$ corresponds to $S^2_{\sigma}=0$ and $n_{\sigma}=-2$, and the reciprocal prior for $\sigma^2$ corresponds to $S^2_{\sigma}=0$ and $n_{\sigma}=0$.

We use the reparametrisation $S = Q\Gamma$, where $Q$ is the Cholesky factorisation of $V$ so that $V=QQ^T$. Posterior simulations of $S$ are obtained by transforming MCMC simulations from the conditional distribution of $\Gamma$ given $Y=y$.

The log posterior density of $\Gamma$ given $Y=y$ is $$ \log f(\gamma|y) = const(y) - \frac{1}{2} \gamma^T (I_n - V^{-1/2}D(D^T V^{-1}D)^{-1}D^T V^{-1/2})\gamma +\sum_{i=1}^n y_i s_i - \log(1+\exp(s_i)), $$ where $(s_1,\ldots,s_n)^T= Q \gamma$.

For the Langevin-Hastings update we use the gradient of the log target density, $$ \nabla(\gamma )^{trunc}= - (I_n - Q^{-1}D(D^T V^{-1}D)^{-1}D^T(Q^{-1})^T)\gamma + Q^T\left\{y_i-\exp(s_i)/(1+\exp(s_i))\right\}_{i=1}^n . $$

The proposal $\gamma'$ follows a multivariate normal distribution with mean vector $\xi(\gamma)=\gamma+(h/2)\nabla(\gamma)^{trunc}$ and covariance matrix $hI$, where $h>0$ is a user-specified ``proposal variance'' (called S.scale; see mcmc.control).

When phi.prior is not "fixed", we update the parameter $\phi$ by a random walk Metropolis step. Here mcmc.input$phi.scale (see mcmc.control) is the proposal variance, which needs to be sufficiently large so that the algorithm easily can move between the discrete values in prior$phi.discrete (see prior.glm.control).

CONTROL FUNCTIONS

The function call includes auxiliary control functions which allows the user to specify and/or change the specification of 1) model components (using model.glm.control), 2) prior distributions (using prior.glm.control), 3) options for the MCMC algorithm (using mcmc.control), and 4) options for the output (using output.glm.control). Default values are available in most of the cases. The arguments for the control functions are described in their respective help files.

In the prediction part of the function we want to predict $\exp(S^*)/(1+\exp(S^*))$ at locations of interest. For the prediction part of the algorithm, we use the median of the predictive distribution as the predictor and 1/4 of the length of the 95 percent predictive interval as a measure of the prediction uncertainty. Below we describe the procedure for calculating these quantities.

First we perform a Bayesian Gaussian prediction with the given priors on $\beta$ and $\sigma^2$ on each of the simulated $S$-``datasets'' from the posterior distribution (and in case $\phi$ is not fixed, for each sampled $\phi$ value). This Gaussian prediction is done by an internal function which is an extension of krige.bayes allowing for more than one ``data set''.

For calculating the probability below a threshold for the predictive distribution given the data, we first calculate this probability for each of the simulated $S$-``datasets''. This is done using the fact that the predictive distribution for each of the simulated $S$-``datasets'' is a multivariate $t$-distribution. Afterwards the probability below a threshold is calculated by taking the empirical mean of these conditional probabilities.

Now the median and the 2.5 percent and 97.5 percent quantiles can be calculated by an iterative procedure, where first a guess of the value is made, and second this guess is checked by calculating the probability of being less than this value. In case the guess is too low, it is adjusted upwards, and vise verse.

Examples

Run this code

# NOT RUN {
data(b50)
# }
# NOT RUN {
if(!exists(".Random.seed", envir=.GlobalEnv, inherits = FALSE)) set.seed(1234)
# }
# NOT RUN {
mcmc.10 <- mcmc.control(S.scale = 0.09, n.iter = 1000, phi.scale = 0.2, 
              phi.start = 4.5)
prior.10 <- prior.glm.control(phi.discrete = seq(0.2,5,0.2))
test.10 <- binom.krige.bayes(b50, locations=t(cbind(c(2.5,3.5),c(-1,3.5),c(2.5,1.5),c(-1,1.5))),
              prior = prior.10, mcmc.input = mcmc.10)
image(test.10)
persp(test.10)
# }

Run the code above in your browser using DataLab