spat.corr.diagnostic: Diagnostics for residual spatial correlation

Description

This function performs two variogram-based tests for residual spatial correlation in real-valued and count (Binomial and Poisson) data.

Usage

spat.corr.diagnostic(
  formula,
  units.m = NULL,
  coords,
  data,
  likelihood,
  ID.coords = NULL,
  n.sim = 200,
  nAGQ = 1,
  uvec = NULL,
  plot.results = TRUE,
  lse.variogram = FALSE,
  kappa = 0.5,
  which.test = "both"
)

Arguments

formula

an object of class formula (or one that can be coerced to that class): a symbolic description of the model to be fitted.

units.m

vector of binomial denominators, or offset if the Poisson model is used.

coords

an object of class formula indicating the geographic coordinates.

data

an object of class "data.frame" containing the data.

likelihood

a character that can be set to "Gaussian","Binomial" or "Poisson"

ID.coords

vector of ID values for the unique set of spatial coordinates obtained from create.ID.coords. These must be provided if, for example, spatial random effects are defined at household level but some of the covariates are at individual level. Warning: the household coordinates must all be distinct otherwise see jitterDupCoords. Default is NULL.

n.sim

number of simulations used to perform the selected test(s) for spatial correlation.

nAGQ

integer scalar (passed to glmer) - the number of points per axis for evaluating the adaptive Gauss-Hermite approximation to the log-likelihood. Defaults to 1, corresponding to the Laplace approximation. Values greater than 1 produce greater accuracy in the evaluation of the log-likelihood at the expense of speed. A value of zero uses a faster but less exact form of parameter estimation for GLMMs by optimizing the random effects and the fixed-effects coefficients in the penalized iteratively reweighted least squares step.

uvec

a vector with values used to define the variogram binning. If uvec=NULL, then uvec is then set to seq(MIN_DIST,(MAX_DIST-MIN_DIST)/2,length=15) where MIN_DIST and MAX_DIST are the minimum and maximum observed distances.

plot.results

if plot.results=TRUE, a plot is returned showing the results for the selected test(s) for spatial correlation. By default plot.results=TRUE.

lse.variogram

if lse.variogram=TRUE, a weighted least square fit of a Matern function (with fixed kappa) to the empirical variogram is performed. If plot.results=TRUE and lse.variogram=TRUE, the fitted weighted least square fit is displayed as a dashed line in the returned plot.

kappa

smothness parameter of the Matern function for the Gaussian process to approximate. The deafault is kappa=0.5.

which.test

a character specifying which test for residual spatial correlation is to be performed: "variogram", "test statistic" or "both". The default is which.test="both". See 'Details'.

Value

An object of class "PrevMap.diagnostic" which is a list containing the following components:

obs.variogram: a vector of length length(uvec)-1 containing the values of the variogram for each of the distance bins defined through uvec.

distance.bins: a vector of length length(uvec)-1 containing the average distance within each of the distance bins defined through uvec.

n.bins: a vector of length length(uvec)-1 containing the number of pairs of data-points falling within each distance bin.

lower.lim: (available only if which.test="both" or which.test="variogram") a vector of length length(uvec)-1 containing the lower limits of the 95 generated under the assumption of absence of spatial correlation at each fo the distance bins defined through uvec.

upper.lim: (available only if which.test="both" or which.test="variogram") a vector of length length(uvec)-1 containing the upper limits of the 95 generated under the assumption of absence of spatial correlation at each fo the distance bins defined through uvec.

mode.rand.effects: the predictive mode of the random effects from the fitted non-spatial generalized linear mixed model.

p.value: (available only if which.test="both" or which.test="test statistic") p-value of the test for residual spatial correlation.

lse.variogram: (available only if lse.variogram=TRUE) a vector of length length(uvec)-1 containing the values of the estimated Matern variogram via a weighted least square fit.

Details

The function first fits a generalized linear mixed model using the for an outcome $Y_i$ which, conditionally on i.i.d. random effects $Z_i$, are mutually independent GLMs with linear predictor $$g^{-1}(\eta_i)=d_i'\beta+Z_i$$ where $d_i$ is a vector of covariates which are specified through formula. Finally, the $Z_i$ are assumed to be zero-mean Gaussian variables with variance $\sigma^2$

Variogram-based graphical diagnostic

This graphical diagnostic is performed by setting which.test="both" or which.test="variogram". The output are 95 (see below lower.lim and upper.lim) that are generated under the assumption of spatial indepdence through the following steps

1. Fit a generalized linear mixed model as indicated by the equation above.

2. Obtain the mode, say $\hat{Z}_i$, of the $Z_i$ conditioned on the data $Y_i$.

3. Compute the empirical variogram using $\hat{Z}_i$

4. Permute the locations specified in coords, n.sim time while holding the $\hat{Z}_i$ fixed.

5. For each of the permuted data-sets compute the empirical variogram based on the $\hat{Z}_i$.

6. From the n.sim variograms obtained in the previous step, compute the 95

If the observed variogram (obs.variogram below), based on the un-permuted $\hat{Z}_i$, falls within the 95 residual spatial correlation; if, instead, that partly falls outside the 95

Test for spatial independence

This diagnostic test is performed if which.test="both" or which.test="test statistic". Let $\hat{v}(B)$ denote the empirical variogram based on $\hat{Z}_i$ for the distance bin $B$. The test statistic used for testing residual spatial correlation is $$T = \sum_{B} N(B) \{v(B)-\hat{\sigma}^2\}$$ where $N(B)$ is the number of pairs of data-points falling within the distance bin $B$ (n.bins below) and $\hat{\sigma}^2$ is the estimate of $\sigma^2$.

To obtain the distribution of the test statistic $T$ under the null hypothesis of spatial independence, we use the simulated empirical variograms as obtained in step 5 of the iterative procedure described in "Variogram-based graphical diagnostic." The p-value for the test of spatial independence is then computed by taking the proportion of simulated values for $T$ under the null the hypothesis that are larger than the value of $T$ based on the original (un-permuted) $\hat{Z}_i$