Prior to downloading data it is often valuable to have some estimate of how
many records are available, both for deciding if the query is feasible,
and for estimating how long it will take to download. Alternatively, for some kinds
of reporting, the count of observations may be all that is required, for example
for understanding how observations are growing or shrinking in particular
locations, or for particular taxa. To this end, atlas_counts()
takes
arguments in the same format as atlas_occurrences()
, and
provides either a total count of records matching the criteria, or a
data.frame
of counts matching the criteria supplied to the group_by
argument.
atlas_counts(
request = NULL,
identify = NULL,
filter = NULL,
geolocate = NULL,
group_by = NULL,
limit = 100,
type = c("record", "species"),
refresh_cache = FALSE
)
optional data_rquest
object: generated by a call to
galah_call()
.
data.frame
: generated by a call to
galah_identify()
.
data.frame
: generated by a call to
galah_filter()
string
: generated by a call to
galah_geolocate()
data.frame
: An object of class galah_group_by
,
as returned by galah_group_by()
. Alternatively a vector of field
names (see search_fields()
and show_all_fields()
.
numeric
: maximum number of categories to return, defaulting to 100.
If limit is NULL, all results are returned. For some categories this will
take a while.
string
: one of c("record", "species")
. Defaults to
"record". If "species", the number of species matching the criteria will be
returned, if "record", the number of records matching the criteria will be
returned.
logical
: if set to TRUE
and
galah_config(caching = TRUE)
then files cached from a previous query will
be replaced by the current query
An object of class tbl_df
and data.frame
(aka a tibble) returning:
A single number, if group_by
is not specified or,
A summary of counts grouped by field(s), if group_by
is specified
With no arguments, return the total number of records in the ALA
atlas_counts() #> # A tibble: 1 x 1 #> count #> <int> #> 1 102070026
You can group counts by state and territory with galah_group_by
galah_call() |> galah_group_by(stateProvince) |> atlas_counts() #> # A tibble: 100 x 2 #> stateProvince count #> <chr> <int> #> 1 New South Wales 24938082 #> 2 Victoria 21775800 #> 3 Queensland 17550396 #> 4 South Australia 8449336 #> # ... with 96 more rows
You can add a filter to narrow your search
galah_call() |> galah_filter(basisOfRecord == "FossilSpecimen") #> An object of type `data_request` containing: #> #> $filter #> # A tibble: 1 x 4 #> variable logical value query #> <chr> <chr> <chr> <chr> #> 1 basisOfRecord == FossilSpecimen "(basisOfRecord:\"FossilSpecimen\")" atlas_counts() #> # A tibble: 1 x 1 #> count #> <int> #> 1 102070026
Specify type = species
to count the number of species, and group record
counts by kingdom
records <- galah_call() |> galah_group_by(kingdom) |> atlas_counts(type = "species")
records #> # A tibble: 10 x 2 #> kingdom count #> <chr> <dbl> #> 1 Animalia 90821 #> 2 Plantae 39883 #> 3 Fungi 16752 #> 4 Chromista 1822 #> 5 Protista 635 #> 6 Bacteria 525 #> 7 Protozoa 493 #> 8 Archaea 0 #> 9 Eukaryota 0 #> 10 Virus 0
Using galah_group_by
allows you to cross-tabulate using two different
variables, similar to using dplyr::group_by() %>% dplyr::count()
records <- galah_call() |> galah_filter(year > 2015) |> galah_group_by(year, basisOfRecord) |> atlas_counts()
records #> # A tibble: 41 x 3 #> basisOfRecord year count #> <chr> <chr> <int> #> 1 HUMAN_OBSERVATION 2020 5825030 #> 2 HUMAN_OBSERVATION 2019 5401216 #> 3 HUMAN_OBSERVATION 2018 5267959 #> 4 HUMAN_OBSERVATION 2017 4348547 #> # ... with 37 more rows