Learn R Programming

enigma (version 0.3.0)

enigma_stats: Get statistics on columns of a dataset from Enigma.

Description

Get statistics on columns of a dataset from Enigma.

Usage

enigma_stats(dataset = NULL, select, conjunction = NULL, operation = NULL,
  by = NULL, of = NULL, limit = 500, search = NULL, where = NULL,
  sort = NULL, page = NULL, key = NULL, ...)

Arguments

dataset
Dataset name. Required.
select
(character) Column to get statistics on. Required.
conjunction
one of "and" or "or". Only applicable when more than one search or where parameter is provided. Default: "and"
operation
(character) Operation to run on a given column. For a numerical column, valid operations are sum, avg, stddev, variance, max, min and frequency. For a date column, valid operations are max, min and frequency. For all other columns, the only valid operation is frequency. Defaults to all available operations based on the column's type.
by
(character) Compound operation to run on a given pair of columns. Valid compound operations are sum and avg. When running a compound operation query, the of parameter is required (see below).
of
(character) Numerical column to compare against when running a compound operation. Required when using the by parameter. Must be a numerical column.
limit
(numeric) Limit the number of frequency, compound sum, or compound average results returned. Max: 500; Default: 500.
search
(character) Filter results by only returning rows that match a search query. By default this searches the entire table for matching text. To search particular fields only, use the query format "@fieldname query". To match multiple queries, the | (or) operator can be used eg. "query1|query2".
where
(character) Filter results with a SQL-style "where" clause. Only applies to numerical columns - use the search parameter for strings. Valid operators are >, < and =. Only one where clause per request is currently supported.
sort
(character) Sort frequency, compound sum, or compound average results in a given direction. + denotes ascending order, - denotes descending
page
(numeric) Paginate frequency, compound sum, or compound average results and return the nth page of results. Pages are calculated based on the current limit, which defaults to 500.
key
(character) Required. An Enigma API key. Supply in the function call, or store in your .Renviron file like ENIGMA_KEY=your key), or in your .Rprofile file as options(enigmaKey = "<your key>"), Obtain an API key by creating an account with Enigma at http://enigma.io, then obtain an API key from your account page.
...
Named curl options passed on to HttpClient

Value

A list with items:
  • success - a boolean if query was successful or not
  • datapath - the dataset path (this is not a file path on your machine)
  • info - a list of length 6 with:
    • column - a list of information on the variable you requested stats on
    • operations - a list of the operations you requested
    • rows_limit - rows limit
    • total_results - total items found (likely more than was returned)
    • total_pages - total pages found (see also current_page)
    • current_page - page returned (see also total_pages)
    • calls_remaining - number of requests remaining
    • seconds_remaining - seconds remaining before your rate limit resets
  • result - a named list of objects - depends on the data source returned

References

https://app.enigma.io/api#stats

Examples

Run this code
## Not run: ------------------------------------
# # After obtaining an API key from Enigma's website, pass in your key to 
# # the function call or set in your options (see above instructions for the 
# # key parameter) If you pass in your key to the function call use the 
# # key parameter
# 
# # stats on a varchar column
# x <- 'gov.mx.imss.compras.main'
# enigma_stats(x, select='provider_id', limit = 10)
# 
# # stats on a numeric column
# enigma_stats(x, select='serialid', limit = 10)
# 
# # stats on a date column
# pakistan <- 'gov.pk.secp.business-registry.all-entities'
# enigma_metadata(dataset=pakistan)
# enigma_stats(dataset=pakistan, select='registration_date', limit = 10)
# 
# # stats on a date column, by the average of a numeric column
# aust <- 'gov.au.government-spending.federal-contracts'
# enigma_metadata(dataset=aust)
# enigma_stats(dataset=aust, select='contractstart', by='avg', of='value', 
#   limit = 10)
# 
# # Get frequency of distances traveled
# ## get columns for the air carrier dataset
# dset <- 'us.gov.dot.rita.trans-stats.air-carrier-statistics.t100d-market-all-carrier'
# enigma_metadata(dset)$columns$table[,c(1:4)]
# enigma_stats(dset, select='distance', limit = 10)
## ---------------------------------------------

Run the code above in your browser using DataLab