pppdist: Distance Between Two Point Patterns

Description

Given two point patterns, find the distance between them based on optimal point matching.

Usage

pppdist(X, Y, type = "spa", cutoff = 1, q = 1, matching = TRUE,
    ccode = TRUE, auction = TRUE, precision = NULL, approximation = 10,
    show.rprimal = FALSE, timelag = 0)

Value

Normally an object of class pppmatching that contains detailed information about the parameters used and the resulting distance. See pppmatching.object for details. If matching = FALSE, only the numerical value of the distance is returned.

Arguments

X,Y: Two point patterns (objects of class "ppp").
type: A character string giving the type of distance to be computed. One of "spa" (default), "ace" or "mat", indicating whether the algorithm should find the optimal matching based on “subpattern assignment”, “assignment only if cardinalities are equal” or “mass transfer”. See Details.
cutoff: The value \(> 0\) at which interpoint distances are cut off.
q: The order of the average that is applied to the interpoint distances. May be Inf, in which case the maximum of the interpoint distances is taken.
matching: Logical. Whether to return the optimal matching or only the associated distance.
ccode: Logical. If FALSE, R code is used which allows for higher precision, but is much slower.
auction: Logical. By default a version of Bertsekas' auction algorithm is used to compute an optimal point matching if type is either "spa" or "ace". If auction is FALSE (or type is "mat") a specialized primal-dual algorithm is used instead. This was the standard in earlier versions of spatstat, but is several orders of magnitudes slower.
precision: Index controlling accuracy of algorithm. The q-th powers of interpoint distances will be rounded to the nearest multiple of 10^(-precision). There is a sensible default which depends on ccode.
approximation: If q = Inf, compute distance based on the optimal matching for the corresponding distance of order approximation. Can be Inf, but this makes computations extremely slow.
show.rprimal: Logical. Whether to plot the progress of the primal-dual algorithm. If TRUE, slow primal-dual R code is used, regardless of the arguments ccode and auction.
timelag: Time lag, in seconds, between successive displays of the iterative solution of the restricted primal problem.

Author

Dominic Schuhmacher dominic.schuhmacher@mathematik.uni-goettingen.de, URL http://dominic.schuhmacher.name/.

Details

Computes the distance between point patterns X and Y based on finding the matching between them which minimizes the average of the distances between matched points (if q=1), the maximum distance between matched points (if q=Inf), and in general the q-th order average (i.e. the 1/qth power of the sum of the qth powers) of the distances between matched points. Distances between matched points are Euclidean distances cut off at the value of cutoff.

The parameter type controls the behaviour of the algorithm if the cardinalities of the point patterns are different. For the type "spa" (subpattern assignment) the subpattern of the point pattern with the larger cardinality \(n\) that is closest to the point pattern with the smaller cardinality \(m\) is determined; then the q-th order average is taken over \(n\) values: the \(m\) distances of matched points and \(n-m\) "penalty distances" of value cutoff for the unmatched points. For the type "ace" (assignment only if cardinalities equal) the matching is empty and the distance returned is equal to cutoff if the cardinalities differ. For the type "mat" (mass transfer) each point pattern is assumed to have total mass \(m\) (= the smaller cardinality) distributed evenly among its points; the algorithm finds then the "mass transfer plan" that minimizes the q-th order weighted average of the distances, where the weights are given by the transferred mass divided by \(m\). The result is a fractional matching (each match of two points has a weight in \((0,1]\)) with the minimized quantity as the associated distance.

The central problem to be solved is the assignment problem (for types "spa" and "ace") or the more general transport problem (for type "mat"). Both are well-known problems in discrete optimization, see e.g. Luenberger (2003).

For the assignment problem pppdist uses by default the forward/backward version of Bertsekas' auction algorithm with automated epsilon scaling; see Bertsekas (1992). The implemented version gives good overall performance and can handle point patterns with several thousand points.

For the transport problem a specialized primal-dual algorithm is employed; see Luenberger (2003), Section 5.9. The C implementation used by default can handle patterns with a few hundreds of points, but should not be used with thousands of points. By setting show.rprimal = TRUE, some insight in the working of the algorithm can be gained.

For a broader selection of optimal transport algorithms that are not restricted to spatial point patterns and allow for additional fine tuning, we recommend the R package transport.

For moderate and large values of q there can be numerical issues based on the fact that the q-th powers of distances are taken and some positive values enter the optimization algorithm as zeroes because they are too small in comparison with the larger values. In this case the number of zeroes introduced is given in a warning message, and it is possible then that the matching obtained is not optimal and the associated distance is only a strict upper bound of the true distance. As a general guideline (which can be very wrong in special situations) a small number of zeroes (up to about 50% of the smaller point pattern cardinality \(m\)) usually still results in the right matching, and the number can even be quite a bit higher and usually still provides a highly accurate upper bound for the distance. These numerical problems can be reduced by enforcing (much slower) R code via the argument ccode = FALSE.

For q = Inf there is no fast algorithm available, which is why approximation is normally used: for finding the optimal matching, q is set to the value of approximation. The resulting distance is still given as the maximum rather than the q-th order average in the corresponding distance computation. If approximation = Inf, approximation is suppressed and a very inefficient exhaustive search for the best matching is performed.

The value of precision should normally not be supplied by the user. If ccode = TRUE, this value is preset to the highest exponent of 10 that the C code still can handle (usually \(9\)). If ccode = FALSE, the value is preset according to q (usually \(15\) if q is small), which can sometimes be changed to obtain less severe warning messages.

References

Bertsekas, D.P. (1992). Auction algorithms for network flow problems: a tutorial introduction. Computational Optimization and Applications 1, 7-66.

Luenberger, D.G. (2003). Linear and nonlinear programming. Second edition. Kluwer.

Schuhmacher, D. (2014). transport: optimal transport in various forms. R package version 0.6-2 (or later)

Schuhmacher, D. and Xia, A. (2008). A new metric between distributions of point processes. Advances in Applied Probability 40, 651--672

Schuhmacher, D., Vo, B.-T. and Vo, B.-N. (2008). A consistent metric for performance evaluation of multi-object filters. IEEE Transactions on Signal Processing 56, 3447--3457.

Examples

Run this code

# equal cardinalities
set.seed(140627)
X <- runifrect(500)
Y <- runifrect(500)
m <- pppdist(X, Y)
m
if(interactive()) {
plot(m)}
  
# differing cardinalities
X <- runifrect(14)
Y <- runifrect(10)
m1 <- pppdist(X, Y, type="spa")
m2 <- pppdist(X, Y, type="ace")
m3 <- pppdist(X, Y, type="mat", auction=FALSE)
summary(m1)
summary(m2)
summary(m3)
if(interactive()) {
m1$matrix
m2$matrix
m3$matrix}

# q = Inf
X <- runifrect(10)
Y <- runifrect(10)
mx1 <- pppdist(X, Y, q=Inf, matching=FALSE)
mx2 <- pppdist(X, Y, q=Inf, matching=FALSE, ccode=FALSE, approximation=50)
mx3 <- pppdist(X, Y, q=Inf, matching=FALSE, approximation=Inf)
all.equal(mx1,mx2,mx3)
# sometimes TRUE
all.equal(mx2,mx3)
# very often TRUE

Run the code above in your browser using DataLab