atlas_counts: Return a count of records

Description

Prior to downloading data it is often valuable to have some estimate of how many records are available, both for deciding if the query is feasible, and for estimating how long it will take to download. Alternatively, for some kinds of reporting, the count of observations may be all that is required, for example for understanding how observations are growing or shrinking in particular locations, or for particular taxa. To this end, atlas_counts() takes arguments in the same format as atlas_occurrences(), and provides either a total count of records matching the criteria, or a data.frame of counts matching the criteria supplied to the group_by argument.

Usage

atlas_counts(
  request = NULL,
  identify = NULL,
  filter = NULL,
  geolocate = NULL,
  group_by = NULL,
  limit = 100,
  type = c("record", "species"),
  refresh_cache = FALSE
)

Arguments

request

optional data_rquest object: generated by a call to galah_call().

identify

data.frame: generated by a call to galah_identify().

filter

data.frame: generated by a call to galah_filter()

geolocate

string: generated by a call to galah_geolocate()

group_by

data.frame: An object of class galah_group_by, as returned by galah_group_by(). Alternatively a vector of field names (see search_fields() and show_all_fields().

limit

numeric: maximum number of categories to return, defaulting to 100. If limit is NULL, all results are returned. For some categories this will take a while.

type

string: one of c("record", "species"). Defaults to "record". If "species", the number of species matching the criteria will be returned, if "record", the number of records matching the criteria will be returned.

refresh_cache

logical: if set to TRUE and galah_config(caching = TRUE) then files cached from a previous query will be replaced by the current query

Value

An object of class tbl_df and data.frame (aka a tibble) returning:

A single number, if group_by is not specified or,
A summary of counts grouped by field(s), if group_by is specified

Examples

With no arguments, return the total number of records in the ALA

atlas_counts()
#> # A tibble: 1 x 1
#>       count
#>       <int>
#> 1 102070026

You can group counts by state and territory with galah_group_by

galah_call() |>
  galah_group_by(stateProvince) |>
  atlas_counts()
#> # A tibble: 100 x 2
#>   stateProvince      count
#>   <chr>              <int>
#> 1 New South Wales 24938082
#> 2 Victoria        21775800
#> 3 Queensland      17550396
#> 4 South Australia  8449336
#> # ... with 96 more rows

You can add a filter to narrow your search

galah_call() |>
  galah_filter(basisOfRecord == "FossilSpecimen")
#> An object of type `data_request` containing:
#> 
#> $filter
#> # A tibble: 1 x 4
#>   variable      logical value          query                               
#>   <chr>         <chr>   <chr>          <chr>                               
#> 1 basisOfRecord ==      FossilSpecimen "(basisOfRecord:\"FossilSpecimen\")"
  atlas_counts() 
#> # A tibble: 1 x 1
#>       count
#>       <int>
#> 1 102070026

Specify type = species to count the number of species, and group record counts by kingdom

records <- galah_call() |>
  galah_group_by(kingdom) |>
  atlas_counts(type = "species")

records
#> # A tibble: 10 x 2
#>    kingdom   count
#>    <chr>     <dbl>
#>  1 Animalia  90821
#>  2 Plantae   39883
#>  3 Fungi     16752
#>  4 Chromista  1822
#>  5 Protista    635
#>  6 Bacteria    525
#>  7 Protozoa    493
#>  8 Archaea       0
#>  9 Eukaryota     0
#> 10 Virus         0

Using galah_group_by allows you to cross-tabulate using two different variables, similar to using dplyr::group_by() %>% dplyr::count()

records <- galah_call() |>
  galah_filter(year > 2015) |>
  galah_group_by(year, basisOfRecord) |>
  atlas_counts()

records
#> # A tibble: 41 x 3
#>   basisOfRecord     year    count
#>   <chr>             <chr>   <int>
#> 1 HUMAN_OBSERVATION 2020  5825030
#> 2 HUMAN_OBSERVATION 2019  5401216
#> 3 HUMAN_OBSERVATION 2018  5267959
#> 4 HUMAN_OBSERVATION 2017  4348547
#> # ... with 37 more rows