Learn R Programming

datapack: A Flexible Container to Transport and Manipulate Data and Associated Resources

The datapack R package provides an abstraction for collating heterogeneous collections of data objects and metadata into a bundle that can be transported and loaded into a single composite file. The methods in this package provide a convenient way to load data from common repositories such as DataONE into the R environment, and to document, serialize, and save data from R to data repositories worldwide.

Note that this package ('datapack') is not related to the similarly named rOpenSci package 'DataPackageR'. Documentation from the DataPackageR github repository states that "DataPackageR is used to reproducibly process raw data into packaged, analysis-ready data sets."

Installation Notes

The datapack R package requires the R package redland. If you are installing on Ubuntu then the Redland C libraries must be installed before the redland and datapack package can be installed. If you are installing on Mac OS X or Windows then installing these libraries is not required.

The following instructions illustrate how to install datapack and its requirements.

Installing on Mac OS X

On Mac OS X datapack can be installed with the following commands:

install.packages("datapack")
library(datapack)

The datapack R package should be available for use at this point.

Note: if you wish to build the required redland package from source before installing datapack, please see the redland installation instructions.

Installing on Ubuntu

For Ubuntu, install the required Redland C libraries by entering the following commands in a terminal window:

sudo apt-get update
sudo apt-get install librdf0 librdf0-dev

Then install the R packages from the R console:

install.packages("datapack")
library(datapack)

The datapack R package should be available for use at this point

Installing on Windows

For windows, the required redland R package is distributed as a binary release, so it is not necessary to install any additional system libraries.

To install the R packages from the R console:

install.packages("datapack")
library(datapack)

Quick Start

See the full manual for documentation, but once installed, the package can be run in R using:

library(datapack)
help("datapack")

Create a DataPackage and add metadata and data DataObjects to it:

library(datapack)
library(uuid)
dp <- new("DataPackage")
mdFile <- system.file("extdata/sample-eml.xml", package="datapack")
mdId <- paste("urn:uuid:", UUIDgenerate(), sep="")
md <- new("DataObject", id=mdId, format="eml://ecoinformatics.org/eml-2.1.0", file=mdFile)
addData(dp, md)

csvfile <- system.file("extdata/sample-data.csv", package="datapack")
sciId <- paste("urn:uuid:", UUIDgenerate(), sep="")
sciObj <- new("DataObject", id=sciId, format="text/csv", filename=csvfile)
dp <- addData(dp, sciObj)
ids <- getIdentifiers(dp)

Add a relationship to the DataPackage that shows that the metadata describes, or "documents", the science data:

dp <- insertRelationship(dp, subjectID=mdId, objectIDs=sciId)
relations <- getRelationships(dp)

Create an Resource Description Framework representation of the relationships in the package:

serializationId <- paste("resourceMap", UUIDgenerate(), sep="")
filePath <- file.path(sprintf("%s/%s.rdf", tempdir(), serializationId))
status <- serializePackage(dp, filePath, id=serializationId, resolveURI="")

Save the DataPackage to a file, using the BagIt packaging format:

bagitFile <- serializeToBagIt(dp) 

Note that the dataone R package can be used to upload a DataPackage to a DataONE Member Node using the uploadDataPackage method. Please see the documentation for the dataone R package, for example:

vignette("upload-data", package="dataone")

Acknowledgements

Work on this package was supported by:

  • NSF-ABI grant #1262458 to C. Gries, M. B. Jones, and S. Collins.
  • NSF-DATANET grants #0830944 and #1430508 to W. Michener, M. B. Jones, D. Vieglais, S. Allard and P. Cruse
  • NSF DIBBS grant #1443062 to T. Habermann and M. B. Jones
  • NSF-PLR grant #1546024 to M. B. Jones, S. Baker-Yeboah, J. Dozier, M. Schildhauer, and A. Budden
  • NSF-PLR grant #2042102 to M. B. Jones, A. Budden, J. Dozier, and M. Schildhauer

Additional support was provided for working group collaboration by the National Center for Ecological Analysis and Synthesis, a Center funded by the University of California, Santa Barbara, and the State of California.

Copy Link

Version

Install

install.packages('datapack')

Monthly Downloads

412

Version

1.4.1

License

Apache License (== 2.0)

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

June 10th, 2022

Functions in datapack (1.4.1)

addData

Add a DataObject to the DataPackage
addMember

Add a DataObject to the DataPackage
initialize,DataObject-method

Initialize a DataObject
DataObject-class

DataObject wraps raw data with system-level metadata
getFormatId

Get the FormatId of the DataObject
getData

Get the data content of a specified data object
removeAccessRule

Remove an access rule from the specified object.
recordDerivation

Record derivation relationships between objects in a DataPackage
selectMember

Return identifiers for objects that match search criteria
replaceMember

Replace the raw data or file associated with a DataObject
describeWorkflow

Add data derivation information to a DataPackage
insertRelationship

Record relationships of objects in a DataPackage
parseRDF

Parse an RDF/XML resource map from a file.
setPublicAccess

Add a Rule to the AccessPolicy to make the object publicly readable.
datapack

datapack, a container for packages of data and associated metadata
setValue

Set values for selected DataPackage members.
initialize,ResourceMap-method

Initialize a ResourceMap object.
SystemMetadata

Create DataONE SystemMetadata object
addAccessRule

Add access rules to the specified object.
ResourceMap-class

ResourceMap provides methods to create, serialize and deserialize an OAI ORE resource map.
calculateChecksum

Calculate a checksum for the DataObject using the specified checksum algorithm
clearAccessPolicy

Clear the accessPolicy from the specified object.
canRead

Test whether the provided subject can read an object.
containsId

Returns true if the specified object is a member of the package
getSize

Get the Count of Objects in the Package
getIdentifiers

Get the Identifiers of Package Members
getIdentifier

Get the Identifier of the DataObject
getTriples

Get the RDF relationships stored in the ResourceMap.
parseSystemMetadata

Parse an external XML document and populate a SystemMetadata object with the parsed data
plotRelationships

Plot derivation relationships obtained from getRelationships
updateMetadata

Update selected elements of the XML content of a DataObject in a DataPackage (aka package member).
getValue

Get values for selected DataPackage members.
getRelationships

Retrieve relationships of package objects
getMember

Return the Package Member by Identifier
removeMember

Remove the Specified Member from the Package
updateRelationships

Update package relationships by replacing an old identifier with a new one.
hasAccessRule

Determine if an access rules exists
removeRelationships

Remove relationships of objects in a DataPackage
serializeRDF

Serialize a ResouceMap.
serializePackage

Create an OAI-ORE resource map from the package
serializeSystemMetadata

Serialize a SystemMetadata object to an XML representation
serializeToBagIt

Serialize A DataPackage into a BagIt Archive File
updateXML

Update selected elements of the XML content of a DataObject
validate

Validate a SystemMetadata object.
initialize,DataPackage-method

Initialize a DataPackage object.
createFromTriples

Populate a ResourceMap with RDF relationships from data.frame.
DataPackage-class

A class representing a data package
datapack-deprecated

Deprecated Methods
freeResourceMap

Free memory used by a ResouceMap.
dmsg

Print a debugging message to stderr.
SystemMetadata-class

A DataONE SystemMetadata object containing basic identification, ownership, access policy, replication policy, and related metadata.
initialize,SystemMetadata-method

Initialize a DataONE SystemMetadata object with default values or values passed in to the constructor.