Learn R Programming

galah (version 1.4.0)

galah_filter: Narrow a query by specifying filters

Description

'filters' are arguments of the form field logical value that are used to narrow down the number of records returned by a specific query. For example, it is common for users to request records from a particular year (year == 2020), or to return all records except for fossils (basisOfRecord != "FossilSpecimen"). The result of galah_filter can be passed to the filters argument in atlas_occurrences(), atlas_species() or atlas_counts(). galah_filter uses non-standard evaluation (NSE), and is designed to be as compatible as possible with dplyr::filter syntax.

Usage

galah_filter(..., profile = NULL)

Arguments

...

filters, in the form field logical value

profile

string: (optional) a data quality profile to apply to the records. See show_all_profiles() for valid profiles. By default no profile is applied.

Value

An object of class data.frame and galah_filter, containing filter values.

Examples

Create a custom filter for records of interest

filters <- galah_filter(
    basisOfRecord == "HumanObservation",
    year >= 2010,
    stateProvince == "New South Wales")

Add the default ALA data quality profile

filters <- galah_filter(
    basisOfRecord == "HumanObservation",
    year >= 2020,
    stateProvince == "New South Wales",
    profile = "ALA")

Use filters to exclude particular values

filter <- galah_filter(year >= 2010 & year != 2021)

atlas_counts(filter = filter) #> # A tibble: 1 x 1 #> count #> <int> #> 1 43916661

Separating statements with a comma is equivalent to an AND statement

galah_filter(year >= 2010 & year < 2020) # is the same as:
galah_filter(year >= 2010, year < 2020)

All statements must include the field name

galah_filter(year == 2010 | year == 2021) # this works (note double equals)
galah_filter(year == c(2010, 2021)) # same as above 
galah_filter(year == 2010 | 2021) # this fails

It is possible to use an object to specify required values

# Numeric example

year_value <- 2010

galah_call() %>% galah_filter(year > year_value) %>% atlas_counts() #> # A tibble: 1 x 1 #> count #> <int> #> 1 42816943

# Categorical example

basis_of_record <- c("HumanObservation", "MaterialSample")

galah_call() %>% galah_filter(basisOfRecord == basis_of_record) %>% atlas_counts() #> # A tibble: 1 x 1 #> count #> <int> #> 1 82809464

solr supports range queries on text as well as numbers. The following queries all Australian States and Territories alphabetically after "Tasmania"

galah_call() %>%
  galah_filter(cl22 >= "Tasmania") %>%
  atlas_counts()
#> # A tibble: 1 x 1
#>      count
#>      <int>
#> 1 30230213

Details

All statements passed to galah_filter() (except the profile argument) take the form of field - logical - value. Permissible examples include:

  • = or == (e.g. year = 2020)

  • !=, e.g. year != 2020)

  • > or >= (e.g. year >= 2020)

  • < or <= (e.g. year <= 2020)

  • OR statements (e.g. year == 2018 | year == 2020)

  • AND statements (e.g. year >= 2000 & year <= 2020)

In some cases R will fail to parse inputs with a single equals sign (=), particularly where statements are separated by & or |. This problem can be avoided by using a double-equals (==) instead.

See Also

search_taxa() and galah_geolocate() for other ways to restrict the information returned by atlas_occurrences() and related functions. Use search_fields() to find fields that you can filter by, and search_field_values() to find what values of those filters are available.