Learn R Programming

EDIutils

A client for the Environmental Data Initiative repository REST API. The EDI data repository is for publication and reuse of ecological data with emphasis on metadata accuracy and completeness. It was developed in collaboration with the US LTER Network and is built upon the PASTA+ software stack. EDIutils includes functions to search and access existing data, evaluate and upload new data, and assist with related data management tasks.

Installation

Get the latest version:

install.packages("EDIutils")

Get the development version:

remotes::install_github("ropensci/EDIutils", ref = "development")

Getting Started

library(EDIutils)

The unit of publication is the data package. It contains one or more data entities (i.e. files) described with EML metadata, a metadata quality report, and a manifest of package contents. Data packages are immutable for reproducible research, yet versionable to allow updates and improved data quality through time. Each version is assigned a DOI and a unique package ID of the form “scope.identifier.revision”. The “scope” is the organizational unit, “identifier” the series, and “revision” the version (e.g. “edi.100.2” is version “2” of data package “edi.100”).

Authentication

Authentication is required by data evaluation and upload functions, and to access user audit logs and services. Contact EDI for an account support@edirepository.org. Authenticate with the login() function.

Search and Access Data

The repository search service is a standard deployment of Apache Solr and indexes select metadata fields of data package metadata. For a list of searchable fields see search_data_packages(). For a browser based search experience, use the EDI data portal.

# List data packages containing the term "water temperature"
res <- search_data_packages(query = 'q="water+temperature"&fl=*')
colnames(res)
#>  [1] "abstract"              "begindate"             "doi"                  
#>  [4] "enddate"               "funding"               "geographicdescription"
#>  [7] "id"                    "methods"               "packageid"            
#> [10] "pubdate"               "responsibleParties"    "scope"                
#> [13] "site"                  "taxonomic"             "title"                
#> [16] "authors"               "spatialCoverage"       "sources"              
#> [19] "keywords"              "organizations"         "singledates"          
#> [22] "timescales"

nrow(res)
#> [1] 798

Data entities are downloaded in raw bytes and parsed by a reader function.

# List data entities of data package edi.1047.1
res <- read_data_entity_names(packageId = "edi.1047.1")
res
#>                           entityId                entityName
#> 1 3abac5f99ecc1585879178a355176f6d        Environmentals.csv
#> 2 f6bfa89b48ced8292840e53567cbf0c8               ByCatch.csv
#> 3 c75642ddccb4301327b4b1a86bdee906               Chinook.csv
#> 4 2c9ee86cc3f3ffc729c5f18bfe0a2a1d             Steelhead.csv
#> 5 785690848dd20f4910637250cdc96819 TrapEfficiencyRelease.csv
#> 6 58b9000439a5671ea7fe13212e889ba5 TrapEfficiencySummary.csv
#> 7 86e61c1a501b7dcf0040d10e009bfd87        TrapOperations.csv

# Read raw bytes of Steelhead.csv (i.e. the 4th data entity)
raw <- read_data_entity(packageId = "edi.1047.1", entityId = res$entityId[4])
head(raw)
#> [1] ef bb bf 44 61 74

# Parse with a .csv reader
data <- readr::read_csv(file = raw)
data
#> # A tibble: 2,926 x 14
#>    Date   trapVisitID subSiteName catchRawID releaseID commonName 
#>    <chr>        <dbl> <chr>            <dbl>     <dbl> <chr>      
#>  1 1/12/~         326 North Chan~      32123         0 Steelhead ~
#>  2 1/14/~         336 North Chan~      33980         0 Steelhead ~
#>  3 1/15/~         337 North Chan~      32683         0 Steelhead ~
#>  4 1/16/~         339 North Chan~      32971         0 Steelhead ~
#>  5 1/17/~         341 North Chan~      33104         0 Steelhead ~
#>  6 1/18/~         342 North Chan~      33304         0 Steelhead ~
#>  7 1/19/~         343 North Chan~      33432         0 Steelhead ~
#>  8 1/21/~         349 North Chan~      34083         0 Steelhead ~
#>  9 1/21/~         349 North Chan~      34084         0 Steelhead ~
#> 10 1/23/~         351 North Chan~      34384         0 Steelhead ~
#> # ... with 2,916 more rows, and 8 more variables:
#> #   lifeStage <chr>, forkLength <dbl>, weight <dbl>, n <dbl>,
#> #   mort <chr>, fishOrigin <chr>, markType <chr>,
#> #   CatchRaw.comments <chr>

Evaluate and Upload Data

The EDI data repository has a “staging” environment to test the upload and rendering of new data packages before publishing to “production”. Authentication is required by functions involving data evaluation and upload. Request an account from support@edirepository.org.

# Authenticate
login()
#> User name: "my_name"
#> User password: "my_secret"

Data package reservations prevent conflicting use of the same identifier.

# Reserve a data package identifier
identifier <- create_reservation(scope = "edi", env = "staging")
identifier
#> [1] 595

Evaluation checks for metadata accuracy and completeness.


# Evaluate data package
transaction <- evaluate_data_package(
 eml = paste0(tempdir(), "/edi.595.1.xml"), 
 env = "staging")
transaction
#> [1] "evaluate_163966785813042760"

# Check status
status <- check_status_evaluate(transaction, env = "staging")
status
#> [1] TRUE

# Read the evaluation report
report <- read_evaluate_report(transaction, as = "char", env = "staging")
message(report)
#> ===================================================
#>   EVALUATION REPORT
#> ===================================================
#>   
#> PackageId: edi.595.1
#> Report Date/Time: 2021-12-16T08:17:40
#> Total Quality Checks: 29
#> Valid: 21
#> Info: 8
#> Warn: 0
#> Error: 0
#> 
#> ---------------------------------------------------
#>   DATASET REPORT
#> ---------------------------------------------------
#>   
#> IDENTIFIER: packageIdPattern
#> NAME: packageId pattern matches "scope.identifier.revision"
#> DESCRIPTION: Check against LTER requirements for scope.identifier.revision
#> EXPECTED: 'scope.n.m', where 'n' and 'm' are integers and 'scope' is one ...
#> FOUND: edi.595.1
#> STATUS: valid
#> EXPLANATION: 
#> SUGGESTION: 
#> REFERENCE: 
#> 
#> IDENTIFIER: emlVersion
#> NAME: EML version 2.1.0 or beyond
#> DESCRIPTION: Check the EML document declaration for version 2.1.0 or higher
#> EXPECTED: eml://ecoinformatics.org/eml-2.1.0 or higher
#> FOUND: https://eml.ecoinformatics.org/eml-2.2.0
#> STATUS: valid
#> EXPLANATION: Validity of this quality report is dependent on this check ...
#> SUGGESTION: 
#> REFERENCE: 
#> ...

Upload after errors and warnings are fixed.

# Create a new data package
transaction <- create_data_package(
 eml = paste0(tempdir(), "/edi.595.1.xml"), 
 env = "staging")
transaction
#> [1] "create_163966765080210573__edi.595.1"

# Check status
status <- check_status_create(
 transaction = transaction, 
 env = "staging")
status
#> [1] TRUE

Once everything looks good in the “staging” environment, then repeat the above reservation and upload steps in the “production” environment where the data package will be assigned a DOI and made discoverable with other published data.

Getting help

Use GitHub Issues for bug reporting, feature requests, and general questions/discussions. When filing bug reports, please include a minimal reproducible example.

Contributing

Community contributions are welcome! Please reference our contributing guidelines for details.


Please note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('EDIutils')

Monthly Downloads

224

Version

1.0.3

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Colin Smith

Last Published

October 10th, 2023

Functions in EDIutils (1.0.3)

is_authorized

Is authorized to read
delete_journal_citation

Delete journal citation
get_docid_reads

Get doc ID reads
get_audit_report

Get audit report
get_recent_uploads

Get recent uploads
get_provenance_metadata

Get provenance metadata
get_packageid_reads

Get package ID reads
get_event_subscription_schema

Get event subscription schema
delete_reservation

Delete reservation
get_journal_citation

Get journal citation
get_event_subscription

Get event subscription
list_active_reservations

List active reservations
list_data_package_citations

List data package citations
list_data_package_identifiers

List data package identifiers
list_service_methods

List service methods
list_principal_owner_citations

List principal owner citations
query_event_subscriptions

Query event subscriptions
list_recent_changes

List recent changes
logout

Logout of the EDI repository
list_user_data_packages

List user data packages
list_data_package_revisions

List data package revisions
read_data_entity

Read data entity
list_working_on

List working on
list_data_package_scopes

List data package scopes
read_data_entity_checksum

Read data entity checksum
read_data_package_report_checksum

Read data package report checksum
read_data_package_doi

Read data package Digital Object Identifier
list_data_descendants

List data descendants
list_data_entities

List data entities
execute_event_subscription

Execute event subscription
list_deleted_data_packages

List deleted data packages
list_data_sources

List data sources
read_data_entity_resource_metadata

Read data entity resource metadata
read_data_entity_size

Read data entity size
read_data_package_error

Read data package error
login

Login to the EDI repository
read_data_package_archive

Read data package archive
read_data_package_citation

Read data package citation
read_data_package_report_summary

Summarize the data package quality report
read_data_package_from_doi

Read data package from Digital Object Identifier
read_metadata_dublin_core

Read metadata Dublin Core
read_data_package_report

Read data package report
read_metadata_entity

Read data entity metadata
list_recent_uploads

List recent uploads
list_reservation_identifiers

List reservation identifiers
read_metadata

Read metadata
read_data_package_resource_metadata

Read data package resource metadata
search_data_packages

Search data packages
read_data_entity_name

Read data entity name
read_data_entity_names

Read data entity names
read_data_entity_sizes

Read data entity sizes
read_data_package

Read data package
read_evaluate_report_summary

Summarize the evaluate quality report
read_data_package_report_resource_metadata

Read data package report resource metadata
read_evaluate_report

Read evaluate report
update_data_package

Update data package
read_metadata_format

Read metadata format
read_metadata_resource_metadata

Read metadata resource metadata
read_metadata_checksum

Read metadata checksum
check_status_create

Check data package creation status
create_data_package_archive

Create data package archive (zip)
create_dn

Create a users distinguished name
create_event_subscription

Create event subscription
check_status_evaluate

Check status of data package evaluation
create_data_package

Create data package
create_reservation

Create reservation
check_status_update

Check data package update status
delete_event_subscription

Delete event subscription
create_journal_citation

Create journal citation
evaluate_data_package

Evaluate data package
get_audit_count

Get audit count
get_audit_record

Get audit record