Learn R Programming

The dataset R Package

The dataset package extension to the R statistical environment aims to ensure that the most important R object that contains a dataset, i.e. a data.frame or an inherited tibble, tsibble or data.table contains important metadata for the reuse and validation of the dataset contents. We aim to offer a novel solution to support individuals or small groups of data scientists working in various business, academic or policy research functions who cannot count on the support of librarians, knowledge engineers, and extensive documentation processes.

The dataset package extends the concept of tidy data and adds further, standardized semantic information to the user’s dataset to increase the (re-)use value of the data object.

  • More descriptive information about the dataset as a creation, its authors, contributors, reuse rights and other metadata to make it easier to find and use.
  • More standardized and linked metadata, such as standard variable definitions and code lists, enable the data owner to gather far more information from third parties or for third parties to understand and use the data correctly.
  • More information about the data provenance makes the quality assessment easier and reduces the need for time-consuming and unnecessary re-processing steps.
  • More structural information about the data makes it more accessible to reuse and join with new information, making it less error-prone for logical errors.

Copy Link

Version

Install

install.packages('dataset')

Monthly Downloads

145

Version

0.3.1

License

GPL (>= 3)

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

January 27th, 2024

Functions in dataset (0.3.1)

describe

Describe a dataset object
related_item

Create a related item
rights

Get/set the Rights of the object.
initialise_dsd

Initialise a DataStructure (internal)
iris_dataset

Edgar Anderson's Iris Data
xsd_convert

Convert to XML Schema Definition (XSD) types
size

Get/Estimate/Add the Size metadata to an object
version

Get/set the version of the object.
statwales

Life Expectancy in Regions of Wales by Sex
subject

Create/add/retrieve a subject
publication_year

Get/set the publication_year of the object.
head.dataset

Return the first or last parts of a dataset object
publisher

Get/set the Publisher of the object.
related_item_identifier

Create a related item identifier
provenance

Get or update provenance information
var_labels

Get / Set a variable labels in a dataset
language

Get/Set the primary language of the dataset
subsetting

Subsetting datasets
DataStructure

Data structure
datacite

Add or get DataCite metadata
dataset_bibentry

Get the bibliographic entries of a dataset
creator

Get/set the Creator of the object.
dataset_download

Download data into a dataset
dublincore

Add or get Dublin Core metadata
get_prefix

Get prefix/resource identifier from CURIE
as_dataset

Create a dataset
geolocation

Get/set the Geolocation of the object.
dataset_namespace

Popular Namespace
dataset_prov

Create a dataset provenance
dataset_title

Get/set title of a dataset
dataset_ttl_write

Write a dataset into Turtle serialisation
dataset_to_triples

Dataset to triples (three columns)
datasource_get

Get/set the source property of a dataset.
id_to_column

Add identifier to columns
identifier

Get/set the Identifier of the object.
description

Get/set the Description of the object.