This function performs two variogram-based tests for residual spatial correlation in real-valued and count (Binomial and Poisson) data.
spat.corr.diagnostic(
formula,
units.m = NULL,
coords,
data,
likelihood,
ID.coords = NULL,
n.sim = 200,
nAGQ = 1,
uvec = NULL,
plot.results = TRUE,
lse.variogram = FALSE,
kappa = 0.5,
which.test = "both"
)
an object of class formula
(or one that can be coerced to that class): a symbolic description of the model to be fitted.
vector of binomial denominators, or offset if the Poisson model is used.
an object of class formula
indicating the geographic coordinates.
an object of class "data.frame" containing the data.
a character that can be set to "Gaussian","Binomial" or "Poisson"
vector of ID values for the unique set of spatial coordinates obtained from create.ID.coords
.
These must be provided if, for example, spatial random effects are defined at
household level but some of the covariates are at individual level. Warning: the household coordinates must all be distinct
otherwise see jitterDupCoords
. Default is NULL
.
number of simulations used to perform the selected test(s) for spatial correlation.
integer scalar (passed to glmer
) - the number of points per axis for evaluating the adaptive Gauss-Hermite approximation to the log-likelihood.
Defaults to 1, corresponding to the Laplace approximation. Values greater than 1 produce greater accuracy in the evaluation of the
log-likelihood at the expense of speed. A value of zero uses a faster but less exact form of parameter estimation for GLMMs by optimizing
the random effects and the fixed-effects coefficients in the penalized iteratively reweighted least squares step.
a vector with values used to define the variogram binning. If uvec=NULL
, then uvec
is then set to seq(MIN_DIST,(MAX_DIST-MIN_DIST)/2,length=15)
where MIN_DIST
and MAX_DIST
are the minimum and maximum observed distances.
if plot.results=TRUE
, a plot is returned showing the results for the selected test(s) for spatial correlation. By default plot.results=TRUE
.
if lse.variogram=TRUE
, a weighted least square fit of a Matern function (with fixed kappa
) to the empirical variogram is performed. If plot.results=TRUE
and lse.variogram=TRUE
, the
fitted weighted least square fit is displayed as a dashed line in the returned plot.
smothness parameter of the Matern function for the Gaussian process to approximate. The deafault is kappa=0.5
.
a character specifying which test for residual spatial correlation is to be performed: "variogram", "test statistic" or "both". The default is which.test="both"
. See 'Details'.
An object of class "PrevMap.diagnostic" which is a list containing the following components:
obs.variogram
: a vector of length length(uvec)-1
containing the values of the variogram for each of
the distance bins defined through uvec
.
distance.bins
: a vector of length length(uvec)-1
containing the average distance within each of the distance bins
defined through uvec
.
n.bins
: a vector of length length(uvec)-1
containing the number of pairs of data-points falling within each distance bin.
lower.lim
: (available only if which.test="both"
or which.test="variogram"
) a vector of length length(uvec)-1
containing the lower limits of the 95
generated under the assumption of absence of spatial correlation at each fo the distance bins defined through uvec
.
upper.lim
: (available only if which.test="both"
or which.test="variogram"
) a vector of length length(uvec)-1
containing the upper limits of the 95
generated under the assumption of absence of spatial correlation at each fo the distance bins defined through uvec
.
mode.rand.effects
: the predictive mode of the random effects from the fitted non-spatial generalized linear mixed model.
p.value
: (available only if which.test="both"
or which.test="test statistic"
) p-value of the test for residual spatial correlation.
lse.variogram
: (available only if lse.variogram=TRUE
) a vector of length length(uvec)-1
containing the values of the estimated Matern variogram via a weighted least square fit.
The function first fits a generalized linear mixed model using the for an outcome \(Y_i\) which, conditionally on i.i.d. random effects \(Z_i\), are mutually independent
GLMs with linear predictor
$$g^{-1}(\eta_i)=d_i'\beta+Z_i$$
where \(d_i\) is a vector of covariates which are specified through formula
. Finally, the \(Z_i\) are assumed to be zero-mean Gaussian variables with variance \(\sigma^2\)
Variogram-based graphical diagnostic
This graphical diagnostic is performed by setting which.test="both"
or which.test="variogram"
. The output are 95
(see below lower.lim
and upper.lim
) that are generated under the assumption of spatial indepdence through the following steps
1. Fit a generalized linear mixed model as indicated by the equation above.
2. Obtain the mode, say \(\hat{Z}_i\), of the \(Z_i\) conditioned on the data \(Y_i\).
3. Compute the empirical variogram using \(\hat{Z}_i\)
4. Permute the locations specified in coords
, n.sim
time while holding the \(\hat{Z}_i\) fixed.
5. For each of the permuted data-sets compute the empirical variogram based on the \(\hat{Z}_i\).
6. From the n.sim
variograms obtained in the previous step, compute the 95
If the observed variogram (obs.variogram
below), based on the un-permuted \(\hat{Z}_i\), falls within the 95
residual spatial correlation; if, instead, that partly falls outside the 95
Test for spatial independence
This diagnostic test is performed if which.test="both"
or which.test="test statistic"
. Let \(\hat{v}(B)\) denote the empirical variogram based on \(\hat{Z}_i\) for the distance bin \(B\).
The test statistic used for testing residual spatial correlation is
$$T = \sum_{B} N(B) \{v(B)-\hat{\sigma}^2\}$$
where \(N(B)\) is the number of pairs of data-points falling within the distance bin \(B\) (n.bins
below) and \(\hat{\sigma}^2\) is the estimate of \(\sigma^2\).
To obtain the distribution of the test statistic \(T\) under the null hypothesis of spatial independence, we use the simulated empirical variograms as obtained in step 5 of the iterative procedure described in "Variogram-based graphical diagnostic." The p-value for the test of spatial independence is then computed by taking the proportion of simulated values for \(T\) under the null the hypothesis that are larger than the value of \(T\) based on the original (un-permuted) \(\hat{Z}_i\)