deldir: Delaunay triangulation and Dirichlet tessellation

Description

This function computes the Delaunay triangulation (and hence the Dirichlet or Voronoi tessellation) of a planar point set according to the second (iterative) algorithm of Lee and Schacter --- see References. The triangulation is made to be with respect to the whole plane by “suspending” it from so-called ideal points (-Inf,-Inf), (Inf,-Inf) (Inf,Inf), and (-Inf,Inf). The triangulation is also enclosed in a finite rectangular window.

Usage

deldir(x, y, z=NULL, rw=NULL, eps=1e-09, sort=TRUE, plot=FALSE,
       round=TRUE,digits=6, id=NULL, ...)

Value

A list (of class deldir), invisible if plot=TRUE, with components:

delsgs

A data frame with 6 columns. The first 4 entries of each row are the coordinates of the points joined by an edge of a Delaunay triangle, in the order (x1,y1,x2,y2). The last two entries are the identifiers (defaulting to the indices, if id is not supplied) of the two points which are joined.

dirsgs

A data frame with 10 columns. The first 4 entries of each row are the coordinates of the endpoints of one the edges of a Dirichlet tile, in the order (x1,y1,x2,y2). The fifth and sixth entries, in the columns named ind1 and ind2, are the identifiers (defaulting to the indices, if id is not supplied) of the two points, in the set being triangulated, which are separated by that edge. The seventh and eighth entries, in the columns named bp1 and bp2 are logical values. The entry in column bp1 indicates whether the first endpoint of the corresponding edge of a Dirichlet tile is a boundary point (a point on the boundary of the rectangular window). Likewise for the entry in column bp2 and the second endpoint of the edge.

The ninth and tenth entries, in columns named thirdv1 and thirdv2 are the identifiers of the respective third vertices of the Delaunay triangles whose circumcentres constitute the corresponding endpoints of the edge under consideration. (The other two vertices of the triangle in question are identified by the entries of columns ind1 and ind2.)

The entries of columns thirdv1 and thirdv2 may (also) take the values $-1, -2, -3$, and $-4$. (These will be coerced to mode character if id was supplied.) This will be the case if the circumcentre in question lies outside of the rectangular window rw. In these circumstances the corresponding endpoint of the tile edge is the intersection of the line joining the two circumcentres with the boundary of rw, and the numeric value of the entry of column “thirdv1” (respectively “thirdv2”) indicates which side. The numbering follows the convention for numbering the sides of a plot region in R: 1 for the bottom side, 2 for the left hand side, 3 for the top side and 4 for the right hand side.

Note that the entry in column thirdv1 will be negative if and only if the corresponding entry in column bp1 is TRUE. Similarly for columns thirdv2 and bp2.

summary

a data frame with 9, 10 or 11 columns and n.data rows (see below). The rows correspond to the points in the set being triangulated. Note that the row names are the indices of the points in the original sequence of points being triangulated/tessellated. Usually these will be the sequence 1, 2, ..., n. However if there were duplicated points then the row name corresponding to a point is the first of the indices of the set of duplicated points in which the given point appears. The columns are:

x (the $x$-coordinate of the point)
y (the $y$-coordinate of the point)
id (the identifers of the points; present only if id was specified)
z (the auxiliary values or “tags”; present only if these were specified)
n.tri (the number of Delaunay triangles emanating from the point)
del.area (1/3 of the total area of all the Delaunay triangles emanating from the point)
del.wts (the corresponding entry of the del.area column divided by the sum of this column)
n.tside (the number of sides --- within the rectangular window --- of the Dirichlet tile surrounding the point)
nbpt (the number of points in which the Dirichlet tile intersects the boundary of the rectangular window)
dir.area (the area of the Dirichlet tile surrounding the point)
dir.wts (the corresponding entry of the dir.area column divided by the sum of this column).

Note that the factor of 1/3 associated with the del.area column arises because each triangle occurs three times --- once for each corner.

n.data

the number of points in the set which was triangulated, with any duplicate points eliminated. It is the same as the number of rows of summary.

del.area

the area of the convex hull of the set of points being triangulated, as formed by summing the del.area column of summary.

dir.area

the area of the rectangular window enclosing the points being triangulated, as formed by summing the dir.area column of summary.

rw

the specification of the corners of the rectangular window enclosing the data, in the order (xmin, xmax, ymin, ymax).

ind.orig

A vector of the indices of the points (x,y) in the set of coordinates initially supplied to deldir() before duplicate points (if any) were removed. These indices are used by triang.list().

Arguments

x,y

These arguments specify the coordinates of the point set being triangulated/tessellated. Argument x may be a numeric vector or it may be a data structure consisting of a matrix, a data frame, a generic list, or an object of class "ppp". (See package spatstat.) Argument y, if specified, is always a numeric vector. The "x" and "y" coordinates are extracted from arguments x and y according to (what can be, in some instances) a rather complicated protocol. See Notes on extracting the coordinates for details of this protocol.

z

Optional argument specifying “auxiliary” values or “tags” associated with the respective points. (See Notes on “tags”.) This argument may be a vector or factor whose entries constitute these tags, or it may be a text string naming such a vector or factor. If z, or the object named by z is a vector (rather than a factor) it may be of any mode (numeric, character, logical, etc.). See Notes on extracting z for how z is handled when argument x is a data structure (rather than a numeric vector).

rw

The coordinates of the corners of the rectangular window enclosing the triangulation, in the order (xmin, xmax, ymin, ymax). Any data points outside this window are discarded. If this argument is omitted, it defaults to values given by the range of the data, plus and minus 10 percent.

eps

A value of epsilon used in testing whether a quantity is zero, mainly in the context of whether points are collinear. If anomalous errors arise, it is possible that these may averted by adjusting the value of eps upward or downward.

sort

Logical argument; if TRUE (the default) the data are sorted into a sequence of “bins” prior to triangulation; this makes the algorithm slightly more efficient. Normally one would set sort equal to FALSE only if one wished to observe some of the fine detail of the way in which adding a point to a data set affected the triangulation, and therefore wished to make sure that the point in question was added last. Essentially this argument would get used only in a debugging process.

plot

Logical argument; if TRUE a plot is produced. The nature of the plot may be controlled by using the ... argument to pass appropriate arguments to plot.deldir(). Without “further instruction” a plot of the points being triangulated and of both the triangulation and the tessellation is produced;

round

Logical scalar. Should the data stored in the returned value be rounded to digits decimal digits? This is essentially for cosmetic purposes. This argument is a “new addition” to deldir(), as of version 0.1-26. Previously rounding was done willy-nilly. The change was undertaken when Kodi Arfer pointed out that the rounding might have unwanted effects upon “downstream” operations.

digits

The number of decimal places to which all numeric values in the returned list should be rounded. Defaults to 6. Ignored if round (see above) is set to FALSE.

id

Optional vector of the same length as x and y, providing identifiers for the points that are being triangulated/tessellated. This vector will be coerced to mode "character" internally. If id is not specified, it defaults to the sequence of indices of the points. Note that id must specify identifiers for all points, even though some of these points may be dropped from consideration if they do not lie inside the “rectangular window” rw. Note also that id must contain no duplicate values.

The id argument should not be confused with the z argument. The two are entirely different and serve entirely different purposes.

...

Auxiliary arguments add, wlines, labelPts, lex, col, lty, pch, xlim, and ylim (and possibly other plotting parameters) may be passed to plot.deldir() through ... if plot=TRUE.

Side Effects

If plot=TRUE a plot of the triangulation and/or tessellation is produced or added to an existing plot.

Notes on extracting the coordinates

The protocol for extracting the $x$ and $y$ coordinates from the arguments x and y is a bit complicated and confusing. It is designed to handle a number of different desiderata and to accommodate various feature-requests that users have made over the years. Basically the protocol is:

If x is a numeric vector and y is a numeric vector then x is used as the $x$-coordinates and y is used as the $y$-coordinates.
If x is a matrix, a data frame, or a generic list), and y is a numeric vector, then the $x$-coordinates are sought amongst the components of x and y is used as the $y$-coordinates.
If x is a matrix, a data frame, or a generic list and y is not specified or cannot be found, then both the $x$-coordinates and $y$-coordinates are sought amongst the components of x.
If x an object of class "ppp" then both the $x$-coordinates and $y$-coordinates are taken from the components of x. If y is specified, it is ignored (with a warning).
If x is a numeric vector and y is not specified or cannot be found, then an error is thrown.

A few more details:

If x is of class "ppp" then it will definitely have components named "x" and "y".
If x is a generic list, it must have a component named "x" (otherwise an error is thrown), and the $x$-coordinates are set equal to this component. If y is not specified or cannot be found, then a "y" component of x is sought. If such a component exists then the $y$-coordinates are set equal to this component. Otherwise an error is thrown).
If x is a matrix or a data frame, the protocol gets a bit more intricate.
- If x has a column named "x" then this column is taken to be the $x$-coordinates.
- Otherwise the $x$-coordinates are taken to be the first column of x that is not named "y" or znm (where znm is the name of the object providing the “tags”, if “tags” have been specified).
- If there is no such first column (e.g. if there are only two columns and these have names "y" and znm) then an error is thrown.
- If y is not specified or cannot be found, and if x has a column named "y" then this column is taken to be the $y$-coordinates.
- Otherwise, in this situation, the $y$-coordinates are taken to be the first column of x that is not named "x" or znm and is not equal to the column previously selected to be the x-coordinates.
- If there is no such first column (e.g. if there are only two columns and these have names "x" and znm), then an error is thrown.

Got all that? :-) If these instructions seem rather bewildering (and indeed they are!) just keep things simple and make calls like deldir(x,y) where x and y are numeric vectors that have been previously assigned in the global environment.

Notes on extracting <code>z</code>

If argument x is a data structure (rather than a numeric vector) and is not an object of class "ppp" then z, if specified and not found, is searched for in x. If x is of class "ppp" then what happens depends on whether z was specified or left to take its default value of NULL. In the former case, z takes the specified value. In the latter case the value of "z" is taken from the marks of x provided that x is indeed a marked point pattern and that the marks are atomic (essentially provided that the marks are not a data frame). Otherwise z is left NULL, i.e. there are no “tags” associated with the points.

Notes on “tags”

The “tags” are simply values that are associated in some way with the data points and hence with the tiles of the tessellation produced. They DO NOT affect the tessellation. In previous versions of this package (0.2-10 and earlier) the entries of z were referred to as “weights”. This terminology has been changed since it is misleading. The tessellation produced when a z argument is supplied is the same as is it would be if there were no z argument (i.e. no “weights”). The deldir package DOES NOT do weighted tessellation.

Notes on Memory Allocation

It is difficult-to-impossible to determine a priori how much memory needs to be allocated (in the Fortran code) for storing the edges of the Delaunay triangles and Dirichlet tiles, and for storing the “adjacency list” used by the Lee-Schacter algorithm. In the code, an attempt is made to allocate sufficient storage. If, during the course of running the algorithm, the amount of storage turns out to be inadequate, the algorithm is halted, the storage is incremented, and the algorithm is restarted (with an informative message). This message may be suppressed by wrapping the call to deldir() in suppressMessages().

Notes on error messages

In previous versions of this package, error traps were set in the underlying Fortran code for 17 different errors. These were identified by an error number which was passed back up the call stack and finally printed out by deldir() which then invisibly returned a NULL value. A glossary of the meanings of the values of was provided in a file to be found in a file located in the inst directory (“folder” if you are a Windoze weenie).

This was a pretty shaganappi system. Consequently, as of version 1.2-1, conversion to “proper” error trapping was implemented. Such error trapping is effected via the rexit() subroutine which is now available in R. (See “Writing R Extensions”, section 6.2.1.)

Note that when an error is detected, deldir() now exits with a genuine error, rather than returning NULL. The glossary of the meanings of “error numbers” is now irrelevant and has been removed from the inst directory.

An error trap that merits particular mention was introduced in version 0.1-16 of deldir. This error trap relates to “triangle problems”. It was drawn to my attention by Adam Dadvar (on 18 December, 2018) that in some data sets collinearity problems may cause the “triangle finding” procedure, used by the algorithm to successively add new points to a tessellation, to go into an infinite loop. A symptom of the collinearity is that the vertices of a putative triangle appear not to be in anticlockwise order irrespective of whether they are presented in the order i, j, k or k, j, i. The result of this anomaly is that the procedure keeps alternating between moving to “triangle” i, j, k and moving to “triangle” k, j, i, forever.

The error trap in question is set in trifnd, the triangle finding subroutine. It detects such occurrences of “clockwise in either orientation” vertices. The trap causes the deldir() function to throw an error rather than disappearing into a black hole.

When an error of the “triangle problems” nature occurs, a possible remedy is to increase the value of the eps argument of deldir(). (See the Examples.) There may conceivably be other problems that lead to infinite loops and so I put in another error trap to detect whether the procedure has inspected more triangles than actually exist, and if so to throw an error.

Note that the strategy of increasing the value of eps is probably the appropriate response in most (if not all) of the cases where errors of this nature arise. Similarly this strategy is probably the appropriate response to the errors

Vertices of “triangle” are collinear and vertex 2 is not between 1 and 3. Error in circen.
Vertices of triangle are collinear. Error in dirseg.
Vertices of triangle are collinear. Error in dirout.

However it is impossible to be sure. The intricacy and numerical delicacy of triangulations is too great for anyone to be able to foresee all the possibilities that could arise.

If there is any doubt as to the appropriateness of the “increase eps” strategy, users are advised to do their best to explore the data set, graphically or by other means, and thereby determine what is actually going on and why problems are occurring.

Warnings

The process for determining if points are duplicated changed between versions 0.1-9 and 0.1-10. Previously there was an argument frac for this function, which defaulted to 0.0001. Points were deemed to be duplicates if the difference in x-coordinates was less than frac times the width of rw and y-coordinates was less than frac times the height of rw. This process has been changed to one which uses duplicated() on the data frame whose columns are x and y.

As a result it may happen that points which were previously eliminated as duplicates will no longer be eliminated. (And possibly vice-versa.)
The components delsgs and summary of the value returned by deldir() are now data frames rather than matrices. The component summary was changed to allow the “auxiliary” values z to be of arbitrary mode (i.e. not necessarily numeric). The component delsgs was then changed for consistency. Note that the other “matrix-like” component dirsgs has been a data frame since time immemorial.

Acknowledgement

I would like to express my most warm and sincere thanks to Duncan Murdoch (Emeritus Professor of Statistics, Western University) for helping me, with incredible patience and forbearance, to straighten out my thinking in respect of adjustments that I recently (October 2021) made to the argument processing protocol in the deldir() function. Duncan provided numerous simple examples to demonstrate when and how things were going wrong, and patiently explained to me how I was getting one aspect of the protocol backwards.

Author

Rolf Turner rolfurner@posteo.net

Details

This package had its origins (way back in the mists of time!) as an Splus library section named “delaunay”. That library section in turn was a re-write of a stand-alone Fortran program written in 1987/88 while the author was with the Division of Mathematics and Statistics, CSIRO, Sydney, Australia. This program was an implementation of the second (iterative) Lee-Schacter algorithm. The stand-alone Fortran program was re-written as an Splus function (which called upon dynamically loaded Fortran code) while the author was visiting the University of Western Australia, May, 1995.

Further revisions were made December 1996. The author gratefully acknowledges the contributions, assistance, and guidance of Mark Berman, of D.M.S., CSIRO, in collaboration with whom this project was originally undertaken. The author also acknowledges much useful advice from Adrian Baddeley, formerly of D.M.S., CSIRO (now Professor of Statistics at Curtin University). Daryl Tingley of the Department of Mathematics and Statistics, University of New Brunswick, provided some helpful insight. Special thanks are extended to Alan Johnson, of the Alaska Fisheries Science Centre, who supplied two data sets which were extremely valuable in tracking down some errors in the code.

Don MacQueen, of Lawrence Livermore National Lab, wrote an Splus driver function for the old stand-alone version of this software. That driver, which was available on Statlib, was deprecated in favour of the Statlib package “delaunay”. Don also collaborated in the preparation of the latter package. It is not clear to me whether the “delaunay” package, or indeed Statlib (or indeed Splus!) still exist.

See the ChangeLog for information about further revisions and bug-fixes.

References

Lee, D. T. and Schacter, B. J. (1980). Two algorithms for constructing a Delaunay triangulation, International Journal of Computer and Information Sciences 9 (3), pp. 219 -- 242.

Ahuja, N. and Schacter, B. J. (1983). Pattern Models. New York: Wiley.

Examples

Run this code

x <- c(2.3,3.0,7.0,1.0,3.0,8.0)
y <- c(2.3,3.0,2.0,5.0,8.0,9.0)

# Let deldir() choose the rectangular window.
dxy1 <- deldir(x,y)

# User chooses the rectangular window.
dxy2 <- deldir(x,y,rw=c(0,10,0,10))

# Put "dummy" points at the corners of the rectangular
# window, i.e. at (0,0), (10,0), (10,10), and (0,10)
xx <- c(x,0,10,10,0)
yy <- c(y,0,0,10,10)
dxy3 <- deldir(xx,yy,rw=c(0,10,0,10))

# Plot the triangulation created (but not the tessellation).
dxy2 <- deldir(x,y,rw=c(0,10,0,10),plot=TRUE,wl="tr")

# Example of collinearity error.
if (FALSE) {
    dniP <- deldir(niProperties) # Throws an error
}
    dniP <- deldir(niProperties,eps=1e-8) # No error.

# Example of using data stored in a data frame.
dsw  <- deldir(seaweed)
# Example where "data" is of class "ppp".
dtoy <- deldir(toyPattern)
# The "tags", in dtoy$summary$z, are the marks of the toy ppp
# object which consists of the letters "a","b","c" and "d".

# Artificial example of tags.
set.seed(42)
trees1  <- sample(c("spruce","birch","poplar","shoe"),20,TRUE)
trees2  <- sample(c("fir","maple","larch","palm"),20,TRUE)
egDat   <- data.frame(x=runif(20),y=runif(20),species=trees1)
tagNm   <- "species"
species <- trees2
dd1     <- deldir(egDat) # No tags.
dd2     <- deldir(egDat,z=species)  # Uses trees1 as the tags.
dd3     <- deldir(egDat,z="species") # Same as dd2.
dd4     <- deldir(egDat,z=tagNm) # Same as dd2 and dd3.
spec    <- species
dd5     <- deldir(egDat,z=spec) # Uses trees2 as the tags.

# Duncan Murdoch's examples.  The deldir() function was not
# handling such examples correctly until Duncan kindly set
# me on the right path.
set.seed(123)
dd6 <- deldir(rnorm(32),rnorm(32),rnorm(32))
#
x <- cbind(x = 1:10, junk = 11:20)
y <- 21:30
z <- 31:40
d7 <- deldir(x=x, y=y, z=z)
#
# print(d7$summary) reveals that x is 1:10, y is 21:30
# and z is 31:40; x[,"junk"] is ignored as it should be.
x <- cbind(x = 1:10, "rnorm(10)" = 11:20)
y <- 21:30
z <- 41:50
d8 <- deldir(x=x, y=y, z=rnorm(10))  
#
# print(d8$summary) reveals that x is 1:10, y is 21:30 and z is a
# vector of standard normal values.  Even though x has a column with
# the name of the z argument i.e. "rnorm(10)" (!!!)  the specified
# value of z takes precedence over this column (and, as per the usual
# R syntax) over the object named "z" in the global environment.

# Artificial example of the use of the "id" argument.
set.seed(42)
x  <- runif(50)
y  <- runif(50)
ll <- expand.grid(a=letters[1:10],b=letters[1:10])
aa <- sample(paste0(ll[["a"]],ll[["b"]]),50)
dxy.wid <- deldir(x,y,id=aa)

Run the code above in your browser using DataLab