Learn R Programming

dataone: R interface to the DataONE network of data repositories

Provides read and write access to data and metadata from the DataONE network of data repositories, including the KNB Data Repository, Dryad, and the NSF Arctic Data Center. Each DataONE repository implements a consistent repository application programming interface. Users call methods in R to access these remote repository functions, such as methods to query the metadata catalog, get access to metadata for particular data packages, and read the data objects from the data repository using the global identifier for each data object. Users can also insert and update data objects on repositories that support these methods. For more details, see the vignettes.

Installation Notes

Version 2.0 of the dataone R package removes the dependency on rJava and significantly changes the base API to correspond to the published DataONE API. Previous methods for accessing DataONE will be maintained, but new methods have been added.

The dataone R package requires the R package redland. If you are installing on Ubuntu then the Redland C libraries must be installed first. If you are installing on Mac OS X or Windows then installing these libraries is not required.

Installing on Mac OS X

On Mac OS X dataone can be installed with the following commands:

install.packages("dataone")
library(dataone)

The dataone R package should be available for use at this point.

Installing on Ubuntu

For ubuntu, install the required Redland C libraries by entering the following commands in a terminal window:

sudo apt-get update
sudo apt-get install librdf0 librdf0-dev

Then install the R packages from the R console:

install.packages("dataone")
library(dataone)

The dataone R package should be available for use at this point

Installing on Windows

For windows, the required redland R package is distributed as a binary release, so it is not necessary to install any additional system libraries.

To install the dataone R packages from the R console:

install.packages("dataone")
library(dataone)

The dataone R package should be available for use at this point.

Quick Start

See the full manual (help(dataone)) for documentation.

To search the DataONE Federation Member Node Knowledge Network for Biocomplexity (KNB) for a dataset:

library(dataone)
cn <- CNode("PROD")
mn <- getMNode(cn, "urn:node:KNB")
mySearchTerms <- list(q="abstract:salmon+AND+keywords:spawn+AND+keywords:chinook",
                      fl="id,title,dateUploaded,abstract,size",
                      fq="dateUploaded:[2017-06-01T00:00:00.000Z TO 2017-07-01T00:00:00.000Z]",
                      sort="dateUploaded+desc")
result <- query(mn, solrQuery=mySearchTerms, as="data.frame")
result[1,c("id", "title")]
id <- result[1,'id']

The metadata file that describes the located research can be downloaded and viewed in an XML viewer, text editor after being written to disk, or in R via the commands below:

library(XML)
metadata <- rawToChar(getObject(mn, id))
doc <- xmlRoot(xmlTreeParse(metadata, asText=TRUE, trim = TRUE, ignoreBlanks = TRUE))
tf <- tempfile()
saveXML(doc, tf)
file.show(tf)

This metadata file describes a data file (CSV) in this data collection (package) that can be obtained using the listed identifier, using the commands:

dataRaw <- getObject(mn, "urn:uuid:49d7a4bc-e4c9-4609-b9a7-9033faf575e0")
dataChar <- rawToChar(dataRaw)
theData <- textConnection(dataChar)
df <- read.csv(theData, stringsAsFactors=FALSE)
df[1,]

Uploading a CSV file to a DataONE Member Node requires user authentication. DataONE user authentication is described in the vignette dataone-federation.

Once the authentication steps have been followed, uploading is done with:

library(datapack)
library(uuid)
d1c <- D1Client("STAGING", "urn:node:mnStageUCSB2")
id <- paste("urn:uuid:", UUIDgenerate(), sep="")
testdf <- data.frame(x=1:10,y=11:20)
csvfile <- paste(tempfile(), ".csv", sep="")
write.csv(testdf, csvfile, row.names=FALSE)
# Build a DataObject containing the csv, and upload it to the Member Node
d1Object <- new("DataObject", id, format="text/csv", filename=csvfile)
uploadDataObject(d1c, d1Object, public=TRUE)

In addition, a collection of science metadata and data can be downloaded with one command, for example:

d1c <- D1Client("PROD", "urn:node:KNB")
pkg <- getDataPackage(d1c, id="urn:uuid:04cd34fd-25d4-447f-ab6e-73a572c5d383", quiet=FALSE)

See the R vignette dataone R Package for more information.

Acknowledgments

Work on this package was supported by:

  • NSF-ABI grant #1262458 to C. Gries, M. B. Jones, and S. Collins.
  • NSF-DATANET grants #0830944 and #1430508 to W. Michener, M. B. Jones, D. Vieglais, S. Allard and P. Cruse
  • NSF DIBBS grant #1443062 to T. Habermann and M. B. Jones
  • NSF-PLR grant #1546024 to M. B. Jones, S. Baker-Yeboah, J. Dozier, M. Schildhauer, and A. Budden
  • NSF-PLR grant #2042102 to M. B. Jones, A. Budden, J. Dozier, and M. Schildhauer

Additional support was provided for working group collaboration by the National Center for Ecological Analysis and Synthesis, a Center funded by the University of California, Santa Barbara, and the State of California.

Copy Link

Version

Install

install.packages('dataone')

Monthly Downloads

379

Version

2.2.2

License

Apache License 2.0

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

June 10th, 2022

Functions in dataone (2.2.2)

CNode

Create a CNode object.
CertificateManager-class

CertficateManager provides mechanisms to obtain, load, verify, and display X509 certificates.
AuthenticationManager-class

Manage DataONE authentication.
CNode-class

Provides R API to DataONE Coordinating Node services.
CertificateManager

Create a CertificateManager object
AbstractTableDescriber-class

Base Class for Specific Metadata Parsers
initialize,D1Client-method

Initialize a D1Client object
AuthenticationManager

Create an AuthenticationManager object
D1Client-class

The D1Client class contains methods that perform high level DataONE tasks
D1Node

Create a D1Node object.
D1Client

The DataONE client class used to download, update and search for data in the DataONE network.
EMLParser-class

Handler for Parsing Table Format Details from Metadata
EMLParser

Construct an EML parser object.
d1_errors

This function parses a DataONE service response message for errors, and extracts and prints error information.
asDataFrame

return the D1Object data as a data.frame.
auth_delete

DELETE a resource with authenticated credentials.
D1Object

Create a D1Object instance.
data.characterEncoding

CharacterEncoding
initialize,D1Object-method

Initialize a D1Object
dataone-deprecated

Deprecated
MNode-class

Provides R API to DataONE Member Node services.
dataone

Search, download and upload data to the DataONE network.
MNode

Create a MNode object representing a DataONE Member Node repository.
auth_post

POST a resource with authenticated credentials.
createDataPackage

Create a DataPackage on a DataONE Member Node
createObject

Create an object on a Member Node.
auth_put

PUT a resource with authenticated credentials.
data.tableSkipLinesHeader

Number of lines to skip before reading data
dataone-defunct

Defunct
echoCredentials

Echo the credentials used to make the call.
encodeSolr

Encode the input for Solr Queries
getPackage

Download a data package from a member node.
downloadObject

Download an object from the DataONE Federation to Disk.
generateIdentifier

Get a unique identifier that is generated by the Member Node repository and guaranteed to be unique.
evaluateAuth

Evaluate DataONE authentication.
downloadCert

Open the CILogon Certificate download page in the default browser.
auth_put_post_delete

POST, PUT, or DELETE a resource with authenticated credentials.
canRead,D1Object-method

Test whether the provided subject can read an object.
getCapabilities

Get the node capabilities description, and store the information in the MNode.
D1Node-class

A base class for CNode and MNode.
data.tableQuoteCharacter

Quote Character
data.tableMissingValueCodes

returns missing value codes
documented.entityNames

Get the entity names associated with each table
getCert

Get the DataONE X.509 Certificate location.
initialize,D1Node-method

Initialize a D1Node
auth_get

GET a resource with authenticated credentials if available.
documented.sizes

Get the sizes of the described data tables.
getFormat

Get information for a single DataONE object format
D1Object-class

D1Object (Defunct) is a representation of a DataObject.
listNodes

Get the list of nodes associated with a CN
setObsoletedBy

Set a pid as being obsoleted by another pid
listMemberNodes

List DataONE Member Nodes.
setMNodeId

Set the member node identifier to be associated with the D1Client object.
getQueryEngineDescription

Query a node for the list of query engines available on the node
getMNode

Get a reference to a node based on its identifier
getFormatId,D1Object-method

Get the FormatId of the D1Object
uploadDataPackage

Upload a DataPackage to a DataONE member node.
getMNodeId

Get the member node identifier associated with this D1Client object..
data.formatFamily

Data Format
data.tableAttributeNames

returns the attribute names
getToken

Get the value of the DataONE Authentication Token, if one exists.
getErrorDescription

Extract an error message from an http response.
getSystemMetadata

Get the metadata describing system properties associated with an object on this Node.
getEndpoint

Return the URL endpoint for the DataONE Coordinating Node.
isCertExpired

Determine if an X.509 certificate has expired.
auth_head

Send a http HEAD request for a resource with authenticated credentials if available.
d1IdentifierSearch

Query the DataONE Solr endpoint of the Coordinating Node.
listFormats

List all object formats registered in DataONE.
d1SolrQuery

A method to query the DataONE solr endpoint of the Coordinating Node.
obscureCert

Obscure the CILogon Client Certificate
getTokenInfo

Get authentication token information
obscureAuth

Temporarily disable DataONE authentication.
setPublicAccess,D1Object-method

Make the object publicly readable.
ping

Test if a node is online and accepting DataONE requests
get_user_agent

User agent string
query

Search DataONE for data and metadata objects
updateSystemMetadata

Update the system metadata associated with an object.
uploadDataObject

Upload a DataObject to a DataONE member node.
archive

Archive an object on a Member Node or Coordinating Node, which hides it from casual searches.
addData,DataPackage,D1Object-method

Add a D1Object containing a data object to a DataPackage
getCertExpires

Show the date and time when an X.509 certificate expires.
getCertInfo

Get X.509 Certificate information
data.tableFieldDelimiter

Field Delimiter
data.tableAttributeTypes

returns the attributes' data types
getDataObject

Download a file (and it's associated system metadata) from the DataONE Federation as a DataObject.
convert.csv

Convert a DataFrame to Standard CSV.
describeObject

Efficiently get systemmetadata for an object.
getCertLocation

Get the file path on disk of the client certificate file.
documented.d1Identifiers

Get DataONE identifiers
getAuthExpires

Get the expiration date of the current authentication method.
getAuthMethod

Get the current valid authentication mechanism.
showAuth

Display all authentication information
getMN

Get a member node client based on its node identifier.
getChecksum

Get the checksum for the data object associated with the specified pid.
getIdentifier,D1Object-method

Get the Identifier of the D1Object
createD1Object

Create the Object in the DataONE System
data.tableAttributeStorageTypes

returns the attributes' data storage types
encodeUrlQuery

Encode the Input for a URL Query Segment.
encodeUrlPath

Encode the Input for a URL Path Segment.
getAuthSubject

Get the authentication subject.
data.tableAttributeOrientation

The Attribute (Header) Orientation
getCN

Get the coordinating node associated with this D1Client object.
getD1Object

Download a data object from the DataONE Federation.
updateObject

Update an object on a Member Node, by creating a new object that replaces an original.
resolve

Get a list of coordinating nodes holding a given pid.
hasReservation

Checks to determine if the supplied subject is the owner of the reservation of id.
reserveIdentifier

Reserve a identifier that is unique in the DataONE network.
showClientSubject

Get DataONE Identity as Stored in the CILogon Certificate.
isAuthExpired

Check if the currently valid authentication method has reached the expiration time.
getDataPackage

Download data from the DataONE Federation as a DataPackage.
isAuthValid

Verify authentication for a member node.
listQueryEngines

Query a node for the list of query engines available on the node
listObjects

Retrieve the list of objects that match the search parameters
isAuthorized

Check if an action is authorized for the specified identifier
getMetadataMember

Get the DataObject containing package metadata
getData,D1Object-method

Get the data content of a D1Object.
restoreCert

Restore the CILogon client certificate by renaming it to its original location
restoreAuth

Restore authentication (after being disabled with obscureAuth).
getObject

Get the bytes associated with an object on this Node.
parseCapabilities

Construct a Node, using a passed in capabilities XML
parseSolrResult

Parse Solr output into an R list