relrisk.ppp: Nonparametric Estimate of Spatially-Varying Relative Risk

Description

Given a multitype point pattern, this function estimates the spatially-varying probability of each type of point, or the ratios of such probabilities, using kernel smoothing. The default smoothing bandwidth is selected by cross-validation.

Usage

# S3 method for ppp
relrisk(X, sigma = NULL, ...,
           at = c("pixels", "points"),
           weights = NULL, varcov = NULL, 
           relative=FALSE,
           adjust=1, edge=TRUE, diggle=FALSE,
           se=FALSE, wtype=c("value", "multiplicity"),
           casecontrol=TRUE, control=1, case, fudge=0)

Value

If se=FALSE (the default), the format is described below. If se=TRUE, the result is a list of two entries,

estimate and SE, each having the format described below.

If X consists of only two types of points, and if casecontrol=TRUE, the result is a pixel image (if at="pixels") or a vector (if at="points"). The pixel values or vector values are the probabilities of a case if relative=FALSE, or the relative risk of a case (probability of a case divided by the probability of a control) if relative=TRUE.

If X consists of more than two types of points, or if casecontrol=FALSE, the result is:

(if at="pixels") a list of pixel images, with one image for each possible type of point. The result also belongs to the class "solist" so that it can be printed and plotted.
(if at="points") a matrix of probabilities, with rows corresponding to data points \(x_i\), and columns corresponding to types \(j\).

The pixel values or matrix entries are the probabilities of each type of point if relative=FALSE, or the relative risk of each type (probability of each type divided by the probability of a control) if relative=TRUE.

If relative=FALSE, the resulting values always lie between 0 and 1. If relative=TRUE, the results are either non-negative numbers, or the values Inf or NA.

Arguments

X: A multitype point pattern (object of class "ppp" which has factor valued marks).
sigma: Optional. The numeric value of the smoothing bandwidth (the standard deviation of isotropic Gaussian smoothing kernel). Alternatively sigma may be a function which can be used to select a different bandwidth for each type of point. See Details.
...: Arguments passed to bw.relrisk to select the bandwidth, or passed to density.ppp to control the pixel resolution.
at: Character string specifying whether to compute the probability values at a grid of pixel locations (at="pixels") or only at the points of X (at="points").
weights: Optional. Weights for the data points of X. A numeric vector, an expression, or a pixel image.
varcov: Optional. Variance-covariance matrix of anisotopic Gaussian smoothing kernel. Incompatible with sigma.
relative: Logical. If FALSE (the default) the algorithm computes the probabilities of each type of point. If TRUE, it computes the relative risk, the ratio of probabilities of each type relative to the probability of a control.
adjust: Optional. Adjustment factor for the bandwidth sigma.
edge: Logical value indicating whether to apply edge correction.
diggle: Logical. If TRUE, use the Jones-Diggle improved edge correction, which is more accurate but slower to compute than the default correction.
se: Logical value indicating whether to compute standard errors as well.
wtype: Character string (partially matched) specifying how the weights should be interpreted for the calculation of standard error. See Details.
casecontrol: Logical. Whether to treat a bivariate point pattern as consisting of cases and controls, and return only the probability or relative risk of a case. Ignored if there are more than 2 types of points. See Details.
control: Integer, or character string, identifying which mark value corresponds to a control.
case: Integer, or character string, identifying which mark value corresponds to a case (rather than a control) in a bivariate point pattern. This is an alternative to the argument control in a bivariate point pattern. Ignored if there are more than 2 types of points.
fudge: Optional. A single numeric value, or a numeric vector with one entry for each type of point. This value will be added to the estimates of point process intensity, before calculation of the relative risk.

Standard error

If se=TRUE, the standard error of the estimate will also be calculated. The calculation assumes a Poisson point process.

If weights are given, then the calculation of standard error depends on the interpretation of the weights. This is controlled by the argument wtype.

If wtype="value" (the default), the weights are interpreted as numerical values observed at the data locations. Roughly speaking, standard errors are proportional to the absolute values of the weights.
If wtype="multiplicity" the weights are interpreted as multiplicities so that a weight of 2 is equivalent to having a pair of duplicated points at the data location. Roughly speaking, standard errors are proportional to the square roots of the weights. Negative weights are not permitted.

The default rule is now wtype="value" but previous versions of relrisk.ppp (in spatstat.explore versions 3.1-0 and earlier) effectively used wtype="multiplicity".

Author

Adrian Baddeley Adrian.Baddeley@curtin.edu.au and Rolf Turner rolfturner@posteo.net.

Details

The command relrisk is generic and can be used to estimate relative risk in different ways.

This function relrisk.ppp is the method for point pattern datasets. It computes nonparametric estimates of relative risk by kernel smoothing (Bithell, 1990, 1991; Diggle, 2003; Baddeley, Rubak and Turner, 2015).

If X is a bivariate point pattern (a multitype point pattern consisting of two types of points) then by default, the points of the first type (the first level of marks(X)) are treated as controls or non-events, and points of the second type are treated as cases or events. Then by default this command computes the spatially-varying probability of a case, i.e. the probability \(p(u)\) that a point at spatial location \(u\) will be a case. If relative=TRUE, it computes the spatially-varying relative risk of a case relative to a control, \(r(u) = p(u)/(1- p(u))\).

If X is a multitype point pattern with \(m > 2\) types, or if X is a bivariate point pattern and casecontrol=FALSE, then by default this command computes, for each type \(j\), a nonparametric estimate of the spatially-varying probability of an event of type \(j\). This is the probability \(p_j(u)\) that a point at spatial location \(u\) will belong to type \(j\). If relative=TRUE, the command computes the relative risk of an event of type \(j\) relative to a control, \(r_j(u) = p_j(u)/p_k(u)\), where events of type \(k\) are treated as controls. The argument control determines which type \(k\) is treated as a control.

If at = "pixels" the calculation is performed for every spatial location \(u\) on a fine pixel grid, and the result is a pixel image representing the function \(p(u)\) or a list of pixel images representing the functions \(p_j(u)\) or \(r_j(u)\) for \(j = 1,\ldots,m\). An infinite value of relative risk (arising because the probability of a control is zero) will be returned as NA.

If at = "points" the calculation is performed only at the data points \(x_i\). By default the result is a vector of values \(p(x_i)\) giving the estimated probability of a case at each data point, or a matrix of values \(p_j(x_i)\) giving the estimated probability of each possible type \(j\) at each data point. If relative=TRUE then the relative risks \(r(x_i)\) or \(r_j(x_i)\) are returned. An infinite value of relative risk (arising because the probability of a control is zero) will be returned as Inf.

Estimation is performed by a simple Nadaraja-Watson type kernel smoother (Bithell, 1990, 1991; Diggle, 2003; Baddeley, Rubak and Turner, 2015, section 14.4). The smoothing bandwidth can be specified in any of the following ways:

sigma is a single numeric value, giving the standard deviation of the isotropic Gaussian kernel.
sigma is a numeric vector of length 2, giving the standard deviations in the \(x\) and \(y\) directions of a Gaussian kernel.
varcov is a 2 by 2 matrix giving the variance-covariance matrix of the Gaussian kernel.
sigma is a function which selects the bandwidth. Bandwidth selection will be applied separately to each type of point. An example of such a function is bw.diggle.
sigma and varcov are both missing or null. Then a common smoothing bandwidth sigma will be selected by cross-validation using bw.relrisk.
An infinite smoothing bandwidth, sigma=Inf, is permitted and yields a constant estimate of relative risk.

If se=TRUE then standard errors will also be computed, based on asymptotic theory, assuming a Poisson process.

The optional argument weights may provide numerical weights for the points of X. It should be a numeric vector of length equal to npoints(X).

The argument weights can also be an expression. It will be evaluated in the data frame as.data.frame(X) to obtain a vector of weights. The expression may involve the symbols x and y representing the Cartesian coordinates, and the symbol marks representing the mark values.

The argument weights can also be a pixel image (object of class "im"). numerical weights for the data points will be extracted from this image (by looking up the pixel values at the locations of the data points in X).

References

Baddeley, A., Rubak, E. and Turner, R. (2015) Spatial Point Patterns: Methodology and Applications with R. Chapman and Hall/CRC Press.

Bithell, J.F. (1990) An application of density estimation to geographical epidemiology. Statistics in Medicine 9, 691--701.

Bithell, J.F. (1991) Estimation of relative risk functions. Statistics in Medicine 10, 1745--1751.

Diggle, P.J. (2003) Statistical analysis of spatial point patterns, Second edition. Arnold.

Diggle, P.J., Zheng, P. and Durr, P. (2005) Non-parametric estimation of spatial segregation in a multivariate point process: bovine tuberculosis in Cornwall, UK. Applied Statistics 54, 645--658.

Examples

Run this code

   p.oak <- relrisk(urkiola, 20)
   if(interactive()) {
      plot(p.oak, main="proportion of oak")
      plot(eval.im(p.oak > 0.3), main="More than 30 percent oak")
      plot(split(lansing), main="Lansing Woods")
      p.lan <- relrisk(lansing, 0.05, se=TRUE)
      plot(p.lan$estimate, main="Lansing Woods species probability")
      plot(p.lan$SE, main="Lansing Woods standard error")
      wh <- im.apply(p.lan$estimate, which.max)
      types <- levels(marks(lansing))
      wh <- eval.im(types[wh])
      plot(wh, main="Most common species")
   }

Run the code above in your browser using DataLab