Learn R Programming

CoordinateCleaner (version 1.0-7)

cc_outl: Flag Geographic Outliers in Species Distributions

Description

Flags records that are outliers in geographic space according to the method defined via the method argument. Geographic outliers often represent erroneous coordinates, for example due to data entry errors, imprecise geo-references, individuals in horticulture/captivity.

Usage

cc_outl(x, lon = "decimallongitude", lat = "decimallatitude", species = "species", 
        method = "quantile", mltpl = 3, tdi = 1000, value = "clean", verbose = TRUE)

Arguments

x

a data.frame. Containing geographical coordinates and species names.

lon

a character string. The column with the longitude coordinates. Default = “decimallongitude”.

lat

a character string. The column with the longitude coordinates. Default = “decimallatitude”.

species

a character string. The column with the species name. Default = “species”.

method

a character string. Defining the method for outlier selection. See details. One of “distance”, “quantile”, “mad”. Default = “quantile”.

mltpl

numeric. The multiplier of the interquartile range (method == 'quantile') or median absolute deviation (method == 'mad')to identify outliers. See details. Default = 3.

tdi

numeric. The minimum absolute distance (method == 'distance') of a record to all other records of a species to be identified as outlier, in km. See details. Default = 1000.

value

a character string. Defining the output value. See value.

verbose

logical. If TRUE reports the name of the test and the number of records flagged.

Value

Depending on the ‘value’ argument, either a data.frame containing the records considered correct by the test (“clean”) or a logical vector, with TRUE = test passed and FALSE = test failed/potentially problematic (“flags”). Default = “clean”.

Details

The method for outlier identification depends on the method argument. If “outlier”: a boxplot method is used and records are flagged as outliers if their mean distance to all other records of the same species is larger than mltpl * the interquartile range of the mean distance of all records of this species. If “mad”: the median absolute deviation is used. In this case a record is flagged as outlier, if the mean distance to all other records of the same species is larger than the median of the mean distance of all points plus/minus the mad of the mean distances of all records of the species * mltpl. If “distance”: records are flagged as outliers, if the minimum distance to the next record of the species is > tdi.

Examples

Run this code
# NOT RUN {
x <- data.frame(species = letters[1:10], 
                decimallongitude = runif(100, -180, 180), 
                decimallatitude = runif(100, -90,90))
                
cc_outl(x)
cc_outl(x, method = "quantile", value = "flags")
cc_outl(x, method = "distance", value = "flags", tdi = 10000)
cc_outl(x, method = "distance", value = "flags", tdi = 1000)
# }

Run the code above in your browser using DataLab