Learn R Programming

h5 (version 0.9.9)

h5-package: H5 - Interface to the HDF5 API

Description

h5 provides an interface to the HDF5 API through S4-classes. HDF5 is a binary data format designed for flexible and efficient I/O, high--volume and complex data. An HDF5 file can be structured in a hierarchical way to store data sets in groups---quite similar to the folder structure in a file system. It supports fast storage and retrieval of R-objects like vectors, matrices and arrays to binary files in a language independent format (currently no data.frames). The package can therefore be used as an alternative to R's save/load mechanism. Since h5 is able to access only subsets of stored data it can also handle data sets which do not fit into memory.

Arguments

Details

h5 can currently only handle homogeneous data sets consisting of one single data type like numeric, integer, character or logical. The creation of metadata through attributes is also supported.

The following objects are supported by h5 and represented through S4 classes:

H5File

holds the pointer to the binary HDF5 file which can include various DataSets in a hierarchical structure defined by H5Groups.

H5Group

can hold various HDF5 objects like DataSets and other H5Groups.

DataSet

stores homogeneous data like vectors, matrices and arrays.

Attribute

stores metadata about other HDF5 objects like H5Group, H5File and DataSet.

DataSpace

Objects defining selections on specified DataSets.

These classes share common functionality through the following base classes:

CommonFG

implements common functionality for H5File and H5Group to create/access sub--H5Groups and DataSets.

H5Location

is the base class of H5File, H5Group and DataSet and implements functions for Attribute creation and retrieval.

The example below shows some typical use cases handling data with HDF5:

  1. Create/Open HDF5 File using H5File, specifying file access mode.

  2. Create/Open Groups and DataSets either implicitly using subsetting operators or explicitly using the S4--methods like createGroup/openGroup or createDataSet/openDataSet, see also CommonFG, CommonFG-Group and CommonFG-DataSet.

  3. Create/Open meta data for HDF5 objects using e.g. h5attr, see also H5Location-Attribute and Attribute.

  4. Retrieve data from DataSets either implicitly using subsetting operators or explicitly with readDataSet which requires a DataSpace object to specify the selection area, see also DataSet, DataSet-Subset and DataSpace.

  5. Extend DataSet using predefined functions like c for 1-dim. vectors or rbind/cbind 2-dimensional DataSets, see also DataSet-Extend.

  6. Close H5File--, H5Group-- DataSet-- or DataSpace objects using h5close.

Examples

Run this code
# NOT RUN {
# 1. Create/Open file 'test.h5' (mode set to 'a'ppend)
file <- h5file("test.h5", 'a')

# 2. Store character vector in group '/test' and dataset 'testvec'
file["test/testvec"] <- LETTERS[1:9]
# Store integer matrix in group '/test/testmat' and dataset 'testmat'
mat <- matrix(1:9, nrow = 3)
rownames(mat) <- LETTERS[1:3]
colnames(mat) <- c("A", "BE", "BUU")
file["test/testmat/testmat"] <- mat
# Store numeric array in group '/test' and dataset 'testarray'
file["test/testarray"] <- array(as.numeric(1:45), dim = c(3, 3, 5))

# 3. Store rownames and column names of matrix as attributes
# Get created data set as object
dset <- file["test/testmat/testmat"]
# Store rownames in attribute 'dimnames_1'
h5attr(dset, "dimnames_1") <- rownames(mat)
# Store columnnames in attribute 'dimnames_2'
h5attr(dset, "dimnames_2") <- colnames(mat)

# 4. Read first 3 elements of testvec
testvec <- file["test/testvec"]
testvec[1:3]
# Read first 2 rows of testmat
testmat <- file["test/testmat/testmat"]
res <- testmat[1:2, ]
# attach rownames and columnnames
rownames(res) <- attr(testmat, "rownames")[1:2]
colnames(res) <- attr(testmat, "colnames")

# 5. Extend testvec 
testvec <- c(testvec, LETTERS[10:26])
# Retrieve entire testvec
testvec[]

# 6. Close open handles
h5close(testvec)
h5close(testmat)
h5close(file)
# }

Run the code above in your browser using DataLab