Learn R Programming

ALA4R (version 1.9.1)

occurrences: Get occurrence data

Description

Retrieve ALA occurrence data via the "occurrence download" web service. At least one of taxon, wkt, or fq must be supplied for a valid query.

Usage

occurrences(
  taxon,
  wkt,
  fq,
  fields,
  extra,
  qa,
  method,
  generate_doi = FALSE,
  email,
  email_notify = FALSE,
  download_reason_id = ala_config()$download_reason_id,
  reason,
  verbose = ala_config()$verbose,
  record_count_only = FALSE,
  use_layer_names = TRUE,
  use_data_table = TRUE
)

Arguments

taxon

string: (optional) query of the form field:value (e.g. "genus:Heleioporus") or a free text search (e.g. "macropodidae"). Note that a free-text search is equivalent to specifying the "text" field (i.e. taxon = "Alaba" is equivalent to taxon = "text:Alaba". The text field is populated with the taxon name along with a handful of other commonly-used fields, and so just specifying your target taxon (e.g. taxon = "Alaba vibex") will probably work. However, for reliable results it is recommended to use a specific field where possible (see ala_fields("occurrence") for valid fields). It is also good practice to quote the taxon name if it contains multiple words, for example taxon = "taxon_name:\"Alaba vibex\""

wkt

string: (optional) a WKT (well-known text) string providing a spatial polygon within which to search, e.g. "POLYGON((140 -37,151 -37,151 ] -26,140.131 -26,140 -37))"

fq

string: (optional) character string or vector of strings, specifying filters to be applied to the original query. These are of the form "INDEXEDFIELD:VALUE" e.g. "kingdom:Fungi". See ala_fields("occurrence",as_is = TRUE) for all the fields that are queryable. NOTE that fq matches are case-sensitive, but sometimes the entries in the fields are not consistent in terms of case (e.g. kingdom names "Fungi" and "Plantae" but "ANIMALIA"). fq matches are ANDed by default (e.g. c("field1:abc","field2:def") will match records that have field1 value "abc" and field2 value "def"). To obtain OR behaviour, use the form c("field1:abc OR field2:def"). See e.g. https://solr.apache.org/guide/6_6/common-query-parameters.html for more information about filter queries

fields

string vector: (optional) a vector of field names to return. Note that the columns of the returned data frame are not guaranteed to retain the ordering of the field names given here. If not specified, a default list of fields will be returned. See ala_fields("occurrence") for valid field names. Field names can be passed as full names (e.g. "Radiation - lowest period (Bio22)") rather than id ("el871"). Use fields = "all" to include all available fields, but note that "all" will probably cause an error because the request URL will exceed the maximum allowable length

extra

string vector: (optional) a vector of field names to include in addition to those specified in fields. This is useful if you would like the default list of fields (i.e. when fields parameter is not specified) plus some additional extras. See ala_fields("occurrence_stored",as_is = TRUE) for valid field names. Field names can be passed as full names (e.g. "Radiation - lowest period (Bio22)") rather than id ("el871"). Use extra = "all" to include all available fields, but note that "all" will probably cause an error with method = "offline" because the request URL will exceed the maximum allowable length

qa

string vector: (optional) list of record issues to include in the download. Use qa = "all" to include all available issues, or qa = "none" to include none. Otherwise see ala_fields("assertions",as_is = TRUE) for valid values

method

string: This parameter is deprecated. Now all queries use offline method unless record_count_only = TRUE more fields are available and larger datasets can be returned

generate_doi

logical: by default no DOI will be generated. Set to true if you intend to use the data in a publication or similar

email

string: the email address of the user performing the download (required unless record_count_only = TRUE

email_notify

logical: by default an email with the download information will be sent to the `email` specified. Set to `FALSE` if you are doing a large number of downloads

download_reason_id

numeric or string: (required unless record_count_only is TRUE) a reason code for the download, either as a numeric ID (currently 0--11) or a string (see ala_reasons for a list of valid ID codes and names). The download_reason_id can be passed directly to this function, or alternatively set using ala_config(download_reason_id = ...)

reason

string: (optional) user-supplied description of the reason for the download. Providing this information is optional but will help the ALA to better support users by building a better understanding of user communities and their data requests

verbose

logical: show additional progress information? [default is set by ala_config()]

record_count_only

logical: if TRUE, return just the count of records that would be downloaded, but don't download them. Note that the record count is always re-retrieved from the ALA, regardless of the caching settings. If a cached copy of this query exists on the local machine, the actual data set size may therefore differ from this record count. record_count_only = TRUE can only be used with method = "indexed"

use_layer_names

logical: if TRUE, layer names will be used as layer column names in the returned data frame (e.g. "radiationLowestPeriodBio22"). Otherwise, layer id value will be used for layer column names (e.g. "el871")

use_data_table

logical: if TRUE, attempt to read the data.csv file using the fread function from the data.table package. If this fails with an error or warning, or if use_data_table is FALSE, then read.table will be used (which may be slower)

Value

Data frame of occurrence results, with one row per occurrence record. The columns of the dataframe will depend on the requested fields

References

See Also

ala_reasons for download reasons; ala_config

Examples

Run this code
# NOT RUN {
## count of records from this data provider
x <- occurrences(taxon = "data_resource_uid:dr356",record_count_only = TRUE)
## download records, with standard fields
x <- occurrences(taxon = "data_resource_uid:dr356", download_reason_id = 10,
email = 'your_email_here')
## download records, with all fields
x <- occurrences(taxon = "data_resource_uid:dr356",download_reason_id = 10,
  fields = ala_fields("occurrence_stored",as_is = TRUE)$name)
## download records, with specified fields
x <- occurrences(taxon = "genus:Heleioporus",
fields = c("longitude","latitude",
  "common_name","taxon_name","el807"),download_reason_id = 10)
## download records in polygon, with no quality assertion information
x <- occurrences(taxon = "genus:Heleioporus",
  wkt = "POLYGON((145 -37,150 -37,150 -30,145 -30,145 -37))",
  download_reason_id = 10,qa = "none")

y <- occurrences(taxon = "taxon_name:\"Alaba vibex\"",fields = c("latitude",
"longitude","el874"),
  download_reason_id = 10)
str(y)
# equivalent direct webservice call
# [see this by setting ala_config(verbose = TRUE)]:
# https://biocache-ws.ala.org.au/ws/occurrences/index/download?q=taxon_name%3A%22Alaba%20vibex%22&
# fields=latitude,longitude,el874&reasonTypeId=10&sourceTypeId=2001&esc=%5C&sep=%09&file=data
occurrences(taxon="taxon_name:\"Eucalyptus gunnii\"",
fields=c("latitude","longitude"),
  qa="none",fq="basis_of_record:LivingSpecimen",
  download_reason_id=10)
# equivalent direct webservice call
# [see this by setting ala_config(verbose=TRUE)]:
# https://biocache-ws.ala.org.au/ws/occurrences/index/download?q=taxon_name%3A%22
# Eucalyptus%20gunnii%22&fq=basis_of_record%3ALivingSpecimen&fields=latitude,longitude&qa=none&
# reasonTypeId=10&sourceTypeId=2001&esc=%5C&sep=%09&file=data
# }

Run the code above in your browser using DataLab