R Access to the Dataverse Network
Many dataverse servers now use Dataverse version 4.0 or greater. A separate package, dataverse, is being developed to work with these newer installations.
The dvn package for R integrates public data sharing into the reproducible research workflow. As such, dvn provides access to both the Data Sharing API and the Data Deposit API for the Dataverse Network online data repository system. The Data Sharing API enables searches of any public dataverse, returning study and file metadata. Limited support is also provided for data download, when terms of use allow direct download of public data.
The Data Deposit API allows users to directly create and modify dataverse listings (studies, metadata, and files), thereby integrating data archiving directly into the R workflow with just a few simple functions. The Data Deposit API is built on the SWORD protocol, though not all features are fully supported by the API.
dvn defaults to providing access to the Harvard Dataverse Network, but this can be changed in each function call or globally using options(dvn = 'https://thedata.harvard.edu/dvn/')
for any valid Dataverse Network.
Users interested in downloading metadata from archives other than Dataverse may be interested in Kurt Hornik's OAIHarvester, which offers metadata download from any web repository that is compliant with the Open Archives Initiative standards. Additionally, rdryad uses OAIHarvester to interface with Dryad. The rfigshare package works in a similar spirit to dvn with http://figshare.com/.
Functions for the Data Sharing API
Metadata Search Functions
dvSearch
, to search public dataversesdvSearchFields
, to provide the searchable fields for use indvSearch
dvMetadata
, to retrieve the metadata for a studydvMetadataFormats
, to retrieve available metadata formats for a studydvExtractFileIds
, to extract available files from thedvMetadata
response- Using
dvTermsOfUse(dvMetadata(objectId))
displays Terms Of Use as HTML for the study
File Access Functions
dvDownloadInfo
, to retrieve information about a study filedvDownload
, to download a study file (if allowed)
Functions for the Data Deposit API
The core workflow for the Data Deposit API involves creating a study listing using dvCreateStudy
, adding one or more files with dvAddFile
, and then making the study public with dvReleaseStudy
. Use of all Data Deposit API functions requires a valid username and password for the selected Dataverse Network, which can be set in each function call or globally using options(dvn.user = 'username', dvn.pwd = 'password')
.
dvCreateStudy
, to create a study listing using metadata (i.e., cataloging information)- Use
dvBuildMetadata
to create metadata for use indvCreateStudy
ordvEditStudy
- Use
dvEditStudy
to overwrite a study's metadata
- Use
dvAddFile
, to add file(s) or active R dataframes to a studydvReleaseStudy
, to publicly release a study
Here is simple code to enact that workflow:
options(dvn.user = 'username', dvn.pwd = 'password')
doc <- dvServiceDoc()
meta <- dvBuildMetadata(title="My Study", creator="Doe, John")
study <- dvCreateStudy(doc, meta)
dvAddFile(study, filename)
dvReleaseStudy(study)
The following functions allow users to view and modify existing studies:
dvDeleteFile
, to delete a file from a studydvDeleteStudy
, to deaccession a released study, or delete an unreleased studydvUserStudies
, to list studies in a named dataversedvStudyStatement
, to access a summary of a study (including citation and stable URI)dvServiceDoc
, to identify the dataverse(s) accessible to a user (This is not particularly relevant to users with access to only one dataverse)dvStudyAtom
, to view study-level URIs for using the Data Deposit API (This is not important for most users, unless they plan to calldvDepositQuery
directly)
Installation
You can find a stable release on CRAN, or install the latest development version from GitHub using Hadley's devtools package:
if (!require("devtools")) {
install.packages("devtools")
library("devtools")
}
install_github("ropensci/dvn")
library("dvn")