geoThin: Thin geographic points deterministically or randomly

Description

This function thins geographic points such that none have nearest neighbors closer than some user-specified distance. For a given set of points that fall within this distance, thinning can be conducted in two ways. Both begin by first calculating all pairwise distances between points. Then, clusters of points are found based on proximity using the "single-linkage" method (i.e., based on minimum distance between groups). Then, either a deterministic or random method is used to select the retained points:

Deterministic: For each cluster, distances between each point in the cluster and all points outside of the cluster are calculated. The point retained in each cluster is the one with the greatest minimum pairwise distance to any points in any other cluster. This point will this be maximally isolated from any other point.
Random: For each cluster, a random point is chosen.

Usage

geoThin(x, minDist, random = FALSE, longLat = 1:2, method = "single", ...)

Value

Object of class x.

Arguments

x: A "spatial points" object of class SpatVector, sf, data.frame, or matrix. If x is a data.frame or matrix, then the points will be assumed to have the WGS84 coordinate system (i.e., unprojected).
minDist: Minimum distance (in meters) needed between points to retain them. Points falling closer than this distance will be candidates for being discarded.
random: Logical: If FALSE (default), then use the deterministic method for thinning. If TRUE, then use the random method.
longLat: Numeric or integer vector: This is ignored if x is a SpatVector or sf object. However, if x is a data.frame or matrix, then this should be a character or integer vector specifying the columns in x corresponding to longitude and latitude (in that order). For example, c('long', 'lat') or c(1, 2). The default is to assume that the first two columns in x represent coordinates.
method: Character: Method used by hclust to cluster points. By default, this is 'single', but in some cases this may result in strange clustering (especially when there is a large number of points). The 'complete' method (or others) may give more reasonable results in these cases.
...: Additional arguments. Not used.

Examples

Run this code


library(sf)

# lemur occurrence data
data(mad0)
data(lemurs)
crs <- getCRS('WGS84')
occs <- lemurs[lemurs$species == 'Eulemur fulvus', ]
ll <- c('longitude', 'latitude')
occs <- st_as_sf(occs, coords = ll, crs = getCRS('WGS84'))

# deterministically thin
det <- geoThin(x = occs, minDist = 30000)

# randomly thin
set.seed(123)
rand <- geoThin(x = occs, minDist = 30000, random = TRUE)

# map
oldPar <- par(mfrow = c(1, 2))

plot(st_geometry(occs), cex = 1.4, main = 'Deterministic')
plot(st_geometry(det), pch = 21, cex = 1.4, bg = 1:nrow(det), add = TRUE)
plot(st_geometry(mad0), add = TRUE)

plot(st_geometry(occs), cex = 1.4, main = 'Random')
plot(st_geometry(rand), pch = 21, cex = 1.4, bg = 1:nrow(rand), add = TRUE)
plot(st_geometry(mad0), add = TRUE)

par(oldPar)

Run the code above in your browser using DataLab