Learn R Programming

PrevMap (version 1.5.4)

variog.diagnostic.lm: Variogram-based validation for linear geostatistical model fits

Description

This function performs model validation for linear geostatistical model using Monte Carlo methods based on the variogram.

Usage

variog.diagnostic.lm(
  object,
  n.sim = 1000,
  uvec = NULL,
  plot.results = TRUE,
  range.fact = 1,
  which.test = "both",
  param.uncertainty = FALSE
)

Arguments

object

an object of class "PrevMap" obtained as an output from linear.model.MLE.

n.sim

integer indicating the number of simulations used for the variogram-based diagnostics. Defeault is n.sim=1000.

uvec

a vector with values used to define the variogram binning. If uvec=NULL, then uvec is then set to seq(MIN_DIST,(MAX_DIST-MIN_DIST)/2,length=15)

plot.results

if plot.results=TRUE, a plot is returned showing the results for the selected test(s) for spatial correlation. By default plot.results=TRUE.

range.fact

a value between 0 and 1 used to disregard all distance bins provided through uvec that are larger than the (pr)xrange.fact, where pr is the practical range, defined as the distance at which the fitted spatial correlation is no less than 0.05. Default is range.fact=1

which.test

a character specifying which test for residual spatial correlation is to be performed: "variogram", "test statistic" or "both". The default is which.test="both". See 'Details.'

param.uncertainty

a logical indicating whether uncertainty in the model parameters should be incorporated in the selected diagnostic tests. Default is param.uncertainty=FALSE. See 'Details.'

Value

An object of class "PrevMap.diagnostic" which is a list containing the following components:

obs.variogram: a vector of length length(uvec)-1 containing the values of the variogram for each of the distance bins defined through uvec.

distance.bins: a vector of length length(uvec)-1 containing the average distance within each of the distance bins defined through uvec.

n.bins: a vector of length length(uvec)-1 containing the number of pairs of data-points falling within each distance bin.

lower.lim: (available only if which.test="both" or which.test="variogram") a vector of length length(uvec)-1 containing the lower limits of the 95 generated under the assumption of absence of suitability of the fitted model at each fo the distance bins defined through uvec.

upper.lim: (available only if which.test="both" or which.test="variogram") a vector of length length(uvec)-1 containing the upper limits of the 95 generated under the assumption of absence of suitability of the fitted model at each fo the distance bins defined through uvec.

mode.rand.effects: the predictive mode of the random effects from the fitted non-spatial generalized linear mixed model.

p.value: (available only if which.test="both" or which.test="test statistic") p-value of the test for residual spatial correlation.

lse.variogram: (available only if lse.variogram=TRUE) a vector of length length(uvec)-1 containing the values of the estimated Matern variogram via a weighted least square fit.

Details

The function takes as an input through the argument object a fitted linear geostaistical model for an outcome \(Y_i\), which is expressed as $$Y_i=d_i'\beta+S(x_i)+Z_i$$ where \(d_i\) is a vector of covariates which are specified through formula, \(S(x_i)\) is a spatial Gaussian process and the \(Z_i\) are assumed to be zero-mean Gaussian. The model validation is performed on the adopted satationary and isotropic Matern covariance function used for \(S(x_i)\). More specifically, the function allows the users to select either of the following validation procedures.

Variogram-based graphical validation

This graphical diagnostic is performed by setting which.test="both" or which.test="variogram". The output are 95 (see below lower.lim and upper.lim) that are generated under the assumption that the fitted model did generate the analysed data-set. This validation procedure proceed through the following steps.

1. Obtain the mean, say \(\hat{Z}_i\), of the \(Z_i\) conditioned on the data \(Y_i\).

2. Compute the empirical variogram using \(\hat{Z}_i\)

3. Simulate n.sim data-sets under the fitted geostatistical model.

4. For each of the simulated data-sets and obtain \(\hat{Z}_i\) as in Step 1. Finally, compute the empirical variogram based on the resulting \(\hat{Z}_i\).

5. From the n.sim variograms obtained in the previous step, compute the 95

If the observed variogram (obs.variogram below), based on the \(\hat{Z}_i\) from Step 2, falls within the 95 evidence against the fitted spatial correlation model; if, instead, that partly falls outside the 95 correlation in the data.

Test for suitability of the adopted correlation function

This diagnostic test is performed if which.test="both" or which.test="test statistic". Let \(v_{E}(B)\) and \(v_{T}(B)\) denote the empirical and theoretical variograms based on \(\hat{Z}_i\) for the distance bin \(B\). The test statistic used for testing residual spatial correlation is $$T = \sum_{B} N(B) \{v_{E}(B)-v_{T}(B)\}$$ where \(N(B)\) is the number of pairs of data-points falling within the distance bin \(B\) (n.bins below).

To obtain the distribution of the test statistic \(T\) under the null hypothesis that the fitted model did generate the analysed data-set, we use the simulated empirical variograms as obtained in step 5 of the iterative procedure described in "Variogram-based graphical validation." The p-value for the test of suitability of the fitted spatial correlation function is then computed by taking the proportion of simulated values for \(T\) that are larger than the value of \(T\) based on the original \(\hat{Z}_i\) in Step 1.