density.ppp: Kernel Smoothed Intensity of Point Pattern

Description

Compute a kernel smoothed intensity function from a point pattern.

Usage

# S3 method for ppp
density(x, sigma=NULL, ...,
        weights=NULL, edge=TRUE, varcov=NULL,
        at="pixels", leaveoneout=TRUE,
        adjust=1, diggle=FALSE,
        se=FALSE, wtype=c("value", "multiplicity"),
        kernel="gaussian",
        scalekernel=is.character(kernel), 
        positive=FALSE, verbose=TRUE)

Value

By default, the result is a pixel image (object of class "im"). Pixel values are estimated intensity values, expressed in “points per unit area”.

If at="points", the result is a numeric vector of length equal to the number of points in x. Values are estimated intensity values at the points of x.

In either case, the return value has attributes

"sigma" and "varcov" which report the smoothing bandwidth that was used.

If weights is a matrix with more than one column, then the result is a list of images (if at="pixels") or a matrix of numerical values (if at="points").

If se=TRUE, the result is a list with two elements named

estimate and SE, each of the format described above.

Arguments

x: Point pattern (object of class "ppp").
sigma: The smoothing bandwidth (the amount of smoothing). The standard deviation of the isotropic smoothing kernel. Either a numerical value, or a function that computes an appropriate value of sigma.
weights: Optional weights to be attached to the points. A numeric vector, numeric matrix, an expression, or a pixel image.
...: Additional arguments passed to pixellate.ppp and as.mask to determine the pixel resolution, or passed to sigma if it is a function.
edge: Logical value indicating whether to apply edge correction.
varcov: Variance-covariance matrix of anisotropic smoothing kernel. Incompatible with sigma.
at: String specifying whether to compute the intensity values at a grid of pixel locations (at="pixels") or only at the points of x (at="points").
leaveoneout: Logical value indicating whether to compute a leave-one-out estimator. Applicable only when at="points".
adjust: Optional. Adjustment factor for the smoothing parameter.
diggle: Logical. If TRUE, use the Jones-Diggle improved edge correction, which is more accurate but slower to compute than the default correction.
kernel: The smoothing kernel. A character string specifying the smoothing kernel (current options are "gaussian", "epanechnikov", "quartic" or "disc"), or a pixel image (object of class "im") containing values of the kernel, or a function(x,y) which yields values of the kernel.
scalekernel: Logical value. If scalekernel=TRUE, then the kernel will be rescaled to the bandwidth determined by sigma and varcov: this is the default behaviour when kernel is a character string. If scalekernel=FALSE, then sigma and varcov will be ignored: this is the default behaviour when kernel is a function or a pixel image.
se: Logical value indicating whether to compute standard errors as well.
wtype: Character string (partially matched) specifying how the weights should be interpreted for the calculation of standard error. See Details.
positive: Logical value indicating whether to force all density values to be positive numbers. Default is FALSE.
verbose: Logical value indicating whether to issue warnings about numerical problems and conditions.

Amount of smoothing

The amount of smoothing is determined by the arguments sigma, varcov and adjust.

if sigma is a single numerical value, this is taken as the standard deviation of the isotropic Gaussian kernel.
alternatively sigma may be a function that computes an appropriate bandwidth from the data point pattern by calling sigma(x). To perform automatic bandwidth selection using cross-validation, it is recommended to use the functions bw.diggle, bw.CvL, bw.scott or bw.ppl.
The smoothing kernel may be made anisotropic by giving the variance-covariance matrix varcov. The arguments sigma and varcov are incompatible.
Alternatively sigma may be a vector of length 2 giving the standard deviations of the $x$ and $y$ coordinates, thus equivalent to varcov = diag(rep(sigma^2, 2)).
if neither sigma nor varcov is specified, an isotropic Gaussian kernel will be used, with a default value of sigma calculated by a simple rule of thumb that depends only on the size of the window.
The argument adjust makes it easy for the user to change the bandwidth specified by any of the rules above. The value of sigma will be multiplied by the factor adjust. The matrix varcov will be multiplied by adjust^2. To double the smoothing bandwidth, set adjust=2.
An infinite bandwidth, sigma=Inf or adjust=Inf, is permitted, and yields an intensity estimate which is constant over the spatial domain.

Edge correction

If edge=TRUE, the intensity estimate is corrected for edge effect bias in one of two ways:

If diggle=FALSE (the default) the intensity estimate is correted by dividing it by the convolution of the Gaussian kernel with the window of observation. This is the approach originally described in Diggle (1985). Thus the intensity value at a point $u$ is $$ \hat\lambda(u) = e(u) \sum_i k(x_i - u) w_i $$ where $k$ is the Gaussian smoothing kernel, $e(u)$ is an edge correction factor, and $w_i$ are the weights.
If diggle=TRUE then the code uses the improved edge correction described by Jones (1993) and Diggle (2010, equation 18.9). This has been shown to have better performance (Jones, 1993) but is slightly slower to compute. The intensity value at a point $u$ is $$ \hat\lambda(u) = \sum_i k(x_i - u) w_i e(x_i) $$ where again $k$ is the Gaussian smoothing kernel, $e(x_i)$ is an edge correction factor, and $w_i$ are the weights.

In both cases, the edge correction term $e(u)$ is the reciprocal of the kernel mass inside the window: $$ \frac{1}{e(u)} = \int_W k(v-u) \, {\rm d}v $$ where $W$ is the observation window.

Smoothing kernel

By default, smoothing is performed using a Gaussian kernel.

The choice of smoothing kernel is determined by the argument kernel. This should be a character string giving the name of a recognised two-dimensional kernel (current options are "gaussian", "epanechnikov", "quartic" or "disc"), or a pixel image (object of class "im") containing values of the kernel, or a function(x,y) which yields values of the kernel. The default is a Gaussian kernel.

If scalekernel=TRUE then the kernel values will be rescaled according to the arguments sigma, varcov and adjust as explained above, effectively treating kernel as the template kernel with standard deviation equal to 1. This is the default behaviour when kernel is a character string. If scalekernel=FALSE, the kernel values will not be altered, and the arguments sigma, varcov and adjust are ignored. This is the default behaviour when kernel is a pixel image or a function.

Desired output

If at="pixels" (the default), intensity values are computed at every location $u$ in a fine grid, and are returned as a pixel image. The point pattern is first discretised using pixellate.ppp, then the intensity is computed using the Fast Fourier Transform. Accuracy depends on the pixel resolution and the discretisation rule. The pixel resolution is controlled by the arguments ... passed to as.mask (specify the number of pixels by dimyx or the pixel size by eps). The discretisation rule is controlled by the arguments ... passed to pixellate.ppp (the default rule is that each point is allocated to the nearest pixel centre; this can be modified using the arguments fractional and preserve).

If at="points", the intensity values are computed to high accuracy at the points of x only. Computation is performed by directly evaluating and summing the kernel contributions without discretising the data. The result is a numeric vector giving the density values. The intensity value at a point $x_i$ is (if diggle=FALSE) $$ \hat\lambda(x_i) = e(x_i) \sum_j k(x_j - x_i) w_j $$ or (if diggle=TRUE) $$ \hat\lambda(x_i) = \sum_j k(x_j - x_i) w_j e(x_j) $$ If leaveoneout=TRUE (the default), then the sum in the equation is taken over all $j$ not equal to $i$, so that the intensity value at a data point is the sum of kernel contributions from all other data points. If leaveoneout=FALSE then the sum is taken over all $j$, so that the intensity value at a data point includes a contribution from the same point.

Weights

If weights is a matrix with more than one column, then the calculation is effectively repeated for each column of weights. The result is a list of images (if at="pixels") or a matrix of numerical values (if at="points").

The argument weights can also be an expression. It will be evaluated in the data frame as.data.frame(x) to obtain a vector or matrix of weights. The expression may involve the symbols x and y representing the Cartesian coordinates, the symbol marks representing the mark values if there is only one column of marks, and the names of the columns of marks if there are several columns.

The argument weights can also be a pixel image (object of class "im"). numerical weights for the data points will be extracted from this image (by looking up the pixel values at the locations of the data points in x).

Standard error

If se=TRUE, the standard error of the estimate will also be calculated. The calculation assumes a Poisson point process.

If weights are given, then the calculation of standard error depends on the interpretation of the weights. This is controlled by the argument wtype.

If wtype="value" (the default), the weights are interpreted as numerical values observed at the data locations. Roughly speaking, standard errors are proportional to the absolute values of the weights.
If wtype="multiplicity" the weights are interpreted as multiplicities so that a weight of 2 is equivalent to having a pair of duplicated points at the data location. Roughly speaking, standard errors are proportional to the square roots of the weights. Negative weights are not permitted.

The default rule is now wtype="value" but previous versions of density.ppp (in spatstat.explore versions 3.1-0 and earlier) effectively used wtype="multiplicity".

The meaning of <code>density.ppp</code>

This function is often misunderstood.

The result of density.ppp is not a spatial smoothing of the marks or weights attached to the point pattern. To perform spatial interpolation of values that were observed at the points of a point pattern, use Smooth.ppp.

The result of density.ppp is not a probability density. It is an estimate of the intensity function of the point process that generated the point pattern data. Intensity is the expected number of random points per unit area. The units of intensity are “points per unit area”. Intensity is usually a function of spatial location, and it is this function which is estimated by density.ppp. The integral of the intensity function over a spatial region gives the expected number of points falling in this region.

Inspecting an estimate of the intensity function is usually the first step in exploring a spatial point pattern dataset. For more explanation, see Baddeley, Rubak and Turner (2015) or Diggle (2003, 2010).

If you have two (or more) types of points, and you want a probability map or relative risk surface (the spatially-varying probability of a given type), use relrisk.

Technical issue: Negative Values

Negative and zero values of the density estimate are possible when at="pixels" because of numerical errors in finite-precision arithmetic.

By default, density.ppp does not try to repair such errors. This would take more computation time and is not always needed. (Also it would not be appropriate if weights include negative values.)

To ensure that the resulting density values are always positive, set positive=TRUE.

Author

Adrian Baddeley Adrian.Baddeley@curtin.edu.au, Rolf Turner r.turner@auckland.ac.nz and Ege Rubak rubak@math.aau.dk

Details

This is a method for the generic function density.

It computes a fixed-bandwidth kernel estimate (Diggle, 1985) of the intensity function of the point process that generated the point pattern x.

The amount of smoothing is controlled by sigma if it is specified.

By default, smoothing is performed using a Gaussian kernel. The resulting density estimate is the convolution of the isotropic Gaussian kernel, of standard deviation sigma, with point masses at each of the data points in x.

Anisotropic kernels, and non-Gaussian kernels, are also supported. Each point has unit weight, unless the argument weights is given.

If edge=TRUE (the default), the intensity estimate is corrected for edge effect bias.

If at="pixels" (the default), the result is a pixel image giving the estimated intensity at each pixel in a grid. If at="points", the result is a numeric vector giving the estimated intensity at each of the original data points in x.

References

Baddeley, A., Rubak, E. and Turner, R. (2015) Spatial Point Patterns: Methodology and Applications with R. Chapman and Hall/CRC Press.

Diggle, P.J. (1985) A kernel method for smoothing point process data. Applied Statistics (Journal of the Royal Statistical Society, Series C) 34 (1985) 138--147.

Diggle, P.J. (2003) Statistical analysis of spatial point patterns, Second edition. Arnold.

Diggle, P.J. (2010) Nonparametric methods. Chapter 18, pp. 299--316 in A.E. Gelfand, P.J. Diggle, M. Fuentes and P. Guttorp (eds.) Handbook of Spatial Statistics, CRC Press, Boca Raton, FL.

Jones, M.C. (1993) Simple boundary corrections for kernel density estimation. Statistics and Computing 3, 135--146.

Examples

Run this code

  if(interactive()) {
    opa <- par(mfrow=c(1,2))
    plot(density(cells, 0.05))
    plot(density(cells, 0.05, diggle=TRUE))
    par(opa)
    v <- diag(c(0.05, 0.07)^2)
    plot(density(cells, varcov=v))
  }
  # automatic bandwidth selection
  plot(density(cells, sigma=bw.diggle(cells)))
  # equivalent:
  plot(density(cells, bw.diggle))
  # evaluate intensity at points
  density(cells, 0.05, at="points")

  # non-Gaussian kernel
  plot(density(cells, sigma=0.4, kernel="epanechnikov"))

  if(interactive()) {
    # see effect of changing pixel resolution
    opa <- par(mfrow=c(1,2))
    plot(density(cells, sigma=0.4))
    plot(density(cells, sigma=0.4, eps=0.05))
    par(opa)
  }

  # relative risk calculation by hand (see relrisk.ppp)
  lung <- split(chorley)$lung
  larynx <- split(chorley)$larynx
  D <- density(lung, sigma=2)
  plot(density(larynx, sigma=2, weights=1/D))

Run the code above in your browser using DataLab