Learn R Programming

speciesgeocodeR (version 1.0-4)

GeoClean: Automated Cleaning of Geographic Coordinates

Description

Provides a number of different tests to clean datasets with geographic coordinates. Each function argument represents a different cleaning step.

Usage

GeoClean(x, isna = TRUE, isnumeric = TRUE, coordinatevalidity = TRUE, containszero = TRUE, zerozero = TRUE, zerozerothresh = 1, latequallong = TRUE, GBIFhead = FALSE, countrycentroid = FALSE, contthresh = 0.5, capitalcoords = FALSE, capthresh = 0.5, countrycheck = FALSE, polygons, referencecountries= countryref, outp = c("summary", "detailed", "cleaned"))

Arguments

x
a data.frame with at least three columns: “identifier” (species name), “XCOOR” (longitude) and “YCOOR” (latitude). Column names might also be “species”, “longitude” and “latitude”. If the arguments “countrycentroid”, “capitalcoords” or “countrycheck” should be used, a fourth column named “country” is needed with the country names in ISO2 or ISO3. Alternatively, a data.frame as downloaded from GBIF.
isna
logical. If TRUE, checks for missing values in the coordinates. Default = TRUE.
isnumeric
logical. If TRUE, checks for non-numeric values in the coordinates. Default = TRUE.
coordinatevalidity
logical. If TRUE, checks for non-valid coordinates (XCOOR > 180 and < -180; YCOOR >90 and <-90). Default = TRUE.
containszero
logical. If TRUE, checks for coordinates that are exactly zero. Default = TRUE.
zerozero
logical. If TRUE, checks if the coordinate fall within a rectangle around the point 0/0. Default = TRUE.
zerozerothresh
numeric. The size of the rectangle around 0/0 in decimal degrees. Default = 0.5.
latequallong
logical. If TRUE, checks for rows where XCOOR = YCOOR. Default = TRUE.
GBIFhead
logical. If TRUE, checks if the coordinate fall within a 0.5 degree rectangle around the GBIF headquarters in Copenhagen. Default = FALSE.
countrycentroid
logical. If TRUE checks if the coordinate fall within a rectangle around the centroid of the country specified in x$country. The size of the rectangle can be controlled using the "countthresh" argument. Default = FALSE.
contthresh
numeric. The size of the rectangle around the country centroid (in degrees). The number is half the length of one rectangle side. Default = 0.5.
capitalcoords
logical. If TRUE, checks if the coordinate fall within a rectangle around the capital of the country specified in x$country. The size of the rectangle can be controlled using the "countthresh" argument. Default = FALSE.
capthresh
numeric. The size of the rectangle around the capital (in degrees). The number is half the length of one rectangle side. Default = 0.5.
countrycheck
logical. If TRUE, checks if the coordinates fall within the country borders of the country indicated in x$country. Default = FALSE.
polygons
The reference polygons for the countrycheck function. By default the wrld_simpl dataset from the maptools package. The maptools package must be loaded to use countrycheck = T.
referencecountries
The reference coordinates for the country centroids and capitals. By default from the countryref data.
outp
character defining the output values. See value section.

Value

if outp = 'summary', a vector of the same length as the input data.frame with TRUE = clean coordinates, FALSE = suspicious coordinates. If outp = 'detailed', a data.frame with one column for each check that was performed: TRUE = clean coordinates, FALSE = suspicious coordinates. If outp = 'cleaned', a cleaned version of the input data.

Details

The capital and country centroids in the country ref dataset are from the CIA World Factbook. The check for country borders is based on the world_simpl data from the maptools package. Please note that the ISO2 code for Namibia (“NA”) might cause problems with the countrycheck argument. If possible use ISO3 country codes.

References

CENTRAL INTELLIGENCE AGENCY (2014) The World Factbook, Washington, DC.

http://opengeocode.org/download/cow.php

Examples

Run this code
data(lemurs_test)
require(maptools)

#run all tests
data(wrld_simpl)
data(countryref)
test <- GeoClean(lemurs_test, GBIFhead = TRUE,
                 countrycentroid = TRUE, contthresh = 0.5,
		 capitalcoords = TRUE, capthresh = 0.5,
		 countrycheck = FALSE, outp = "cleaned")

insidecountry <- GeoClean(test, isna = FALSE, isnumeric = FALSE,
                          coordinatevalidity = FALSE,
			  containszero = FALSE, zerozero = FALSE,
			  latequallong = FALSE, GBIFhead = FALSE,
			  countrycentroid = FALSE,
			  contthresh = 0.5, capitalcoords = FALSE,
			  capthresh = 0.5, countrycheck = TRUE,
			  polygons = wrld_simpl)
#outp = "detailed"
test <- GeoClean(lemurs_test, GBIFhead = TRUE,
                 countrycentroid = TRUE, contthresh = 0.5,
		 capitalcoords = TRUE, capthresh = 0.5,
		 countrycheck = FALSE, outp = "detailed")

Run the code above in your browser using DataLab