Learn R Programming

PrevMap (version 1.5.4)

lm.ps.MCML: Monte Carlo Maximum Likelihood estimation of the geostatistical linear model with preferentially sampled locations

Description

This function performs Monte Carlo maximum likelihood (MCML) estimation for a geostatistical linear model with preferentially sampled locations. For more details on the model, see below.

Usage

lm.ps.MCML(
  formula.response,
  formula.log.intensity = ~1,
  coords,
  which.is.preferential = NULL,
  data.response,
  data.intensity = NULL,
  par0,
  control.mcmc,
  kappa1,
  kappa2,
  mesh,
  grid.intensity,
  start.par = NULL,
  method = "nlminb",
  messages = TRUE,
  plot.correlogram = TRUE
)

Arguments

formula.response

an object of class formula (or one that can be coerced to that class): a symbolic description of the sub-model for the response variable.

formula.log.intensity

an object of class formula (or one that can be coerced to that class): a symbolic description of the log-Gaussian Cox process sub-model.

coords

an object of class formula indicating the spatial coordinates in the data.

which.is.preferential

a vector of 0 and 1, where 1 indicates a location in the data from a prefential sampling scheme and 0 from a non-preferential. This option is used to fit a model with a mix of preferentally and non-preferentiall sampled locations. For more, details on the model structure see the 'Details' section.

data.response

a data frame containing the variables in the sub-model of the response variable.

data.intensity

a data frame containing the variables in the log-Gaussian Coz process sub-model. This data frame must be provided only when explanatory variables are used in the log-Gaussian Cox process model. Each row in the data frame must correspond to a point in the grid provided through the argument 'grid.intensity'. Deafult is data.intensity=NULL, which corresponds to a model with only the intercept.

par0

an object of class 'coef.PrevMap.ps'. This argument is used to define the parameters of the importance sampling distribution used in the MCML algorithm. The input of this argument must be defined using the set.par.ps function.

control.mcmc

output from control.mcmc.MCML which defined the control parameters of the Monte Carlo Markv chain algorithm.

kappa1

fixed value for the shape parameter of the Matern covariance function of the spatial process of the sampling intensity (currently only kappa1=1 is implemented).

kappa2

fixed value for the shape parameter of the Matern covariance function of the spatial process of the response variable.

mesh

an object obtained as result of a call to the function inla.mesh.2d.

grid.intensity

a regular grid covering the geographical region of interest, used to approximate the density function of the log-Gaussian Cox process.

start.par

starting value of the optimization algorithm. This is an object of class 'coef.PrevMap.ps' and must be defined using the function set.par.ps. Default is start.cov.pars=NULL, so that the starting values are set automatically.

method

method of optimization. If method="BFGS" then the maxBFGS function is used; otherwise method="nlminb" to use the nlminb function. Default is method="BFGS".

messages

logical; if messages=TRUE then status messages are printed on the screen (or output device) while the function is running. Default is messages=TRUE.

plot.correlogram

logical; if plot.correlogram=TRUE the autocorrelation plot of the samples of the random effect is displayed after completion of conditional simulation. Default is plot.correlogram=TRUE.

Value

An object of class "PrevMap.ps". The function summary.PrevMap.ps is used to print a summary of the fitted model. The object is a list with the following components:

estimate: estimates of the model parameters; use the function coef.PrevMap.ps to obtain estimates of covariance parameters on the original scale.

covariance: covariance matrix of the MCML estimates.

log.lik: maximum value of the approximated log-likelihood.

y: observed values of the response variable. If which.is.preferential has been provided, then y is a list with components y$preferential, for the data with prefentially sampled locations, and y$non.preferential, for the remiaining.

D.response: matrix of covariates used to model the mean component of the response variable. If which.is.preferential has been provided, then D.response is a list with components D.response$preferential, for the data with prefentially sampled locations, and D.response$non.preferential, for the remiaining.

D.intensity: matrix of covariates used to model the mean component of log-intensity of the log-Gaussian Cox process.

grid.intensity: grid of locations used to approximate the intractable integral of the log-Gaussian Cox process model.

coords: matrix of the observed sampling locations. If which.is.preferential has been provided, then coords is a list with components y$preferential, for the data with prefentially sampled locations, and y$non.preferential, for the remiaining.

method: method of optimization used.

ID.coords: set of ID values defined through the argument ID.coords.

kappa.response: fixed value of the shape parameter of the Matern covariance function used to model the spatial process associated with the response variable.

mesh: the mesh used in the SPDE approximation.

samples: matrix of the random effects samples from the importance sampling distribution used to approximate the likelihood function.

call: the matched call.

Details

This function performs parameter estimation for a geostatistical linear model with preferentially sampled locations. Let \(S_{1}\) and \(S_{2}\) denote two independent, stationary and isotropic Gaussian processes. The overall model consists of two sub-models: the log-Gaussian Cox process model for the preferentially sampled locations, say \(X\); the model for the response variable, say \(Y\). The model assumes that $$[X, Y, S_1, S_2] = [S_1][S_2] [X | S_1] [Y | X, S_1, S_2],$$ where \([.]\) denotes 'the distribution of .'. Each of the two submodels has an associated linear predictor. Let \(\Lambda(x)\) denote the intensity of the Poisson process \(X\), continionally on \(S_1\). Then $$\log\{\Lambda(x)\} = d(x)'\alpha + S_1$$, where \(d(x)\) is vector of explanatory variables with regression coefficient \(\alpha\). This linear predictor is defined through the argument formula.log.intensity. The density of \([X | S_1]\) is given by $$\frac{\Lambda(x)}{\int_{A} \Lambda(u) du}$$, where \(A\) is the region of interest. The integral at the denominator is intractable and is then approximated using a quadrature procedure. The regular grid covering \(A\), used for the quadrature, must be provided through the argument grid.intensity. Conditionally on \(X\), \(S_1\) and \(S_2\), the response variable model is given by $$Y = d(x)'\beta + S_2 + \gamma S_1,$$ where \(\beta\) is another vector of regression coefficients and \(\gamma\) is the preferentiality parameter. If \(\gamma=0\) then we recover the standard geostatistical model. More details on the fitting procedure can be found in Diggle and Giorgi (2016).

When the data have a mix of preferentially and non-preferentially sampled locations. In some cases the set of locations may consist of a sub-set which is preferentially sampled, \(X\), and a standard non-prefential sample, \(X^*\). Let \(Y\) and \(Y^*\) denote the measurments at locations \(X\) and \(X^*\). In the current implementation, the model has the following form $$[X, X^*, Y, Y^*, S_1, S_2, S_2^*] = [S_1][S_2][S_2^*] [X | S_1] [Y | X, S_1, S_2] [X^*] [Y^*|X^*, S_2^*],$$ where \(S_2\) and \(S_2^*\) are two independent Gaussian process but with shared parameters, associated with \(Y\) and \(Y^*\), respectively. The linear predictor for \(Y\) is the same as above. The measurements \(Y^*\), instead, have linear predicotr $$Y^* = d(x)'\beta + S_2^*,$$ where \(\beta^*\) is vector of regression coefficients, different from \(\beta\). The linear predictor for \(Y\) and \(Y^*\) is specified though formula.response. For example, response ~ x | x + z defines a linear predictor for \(Y\) with one explanatory variable x and a linear predictor for \(Y^*\) with two explanatory variables x and z. An example on the application of this model is given in Diggle and Giorgi (2016).

References

Diggle, P.J., Giorgi, E. (2019). Model-based Geostatistics for Global Public Health. CRC/Chapman & Hall.

Giorgi, E., Diggle, P.J. (2017). PrevMap: an R package for prevalence mapping. Journal of Statistical Software. 78(8), 1-29. doi: 10.18637/jss.v078.i08

Diggle, P.J., Giorgi, E. (2017). Preferential sampling of exposures levels. In: Handbook of Environmental and Ecological Statistics. Chapman & Hall.

Diggle, P.J., Menezes, R. and Su, T.-L. (2010). Geostatistical analysis under preferential sampling (with Discussion). Applied Statistics, 59, 191-232.

Lindgren, F., Havard, R., Lindstrom, J. (2011). An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach (with discussion). Journal of the Royal Statistical Society, Series B, 73, 423--498.

Pati, D., Reich, B. J., and Dunson, D. B. (2011). Bayesian geostatistical modelling with informative sampling locations. Biometrika, 98, 35-48.