Learn R Programming

HDF5Array (version 1.0.2)

HDF5Array-class: HDF5 datasets as array-like objects

Description

We provide 2 classes for representing an (on-disk) HDF5 dataset as an array-like object in R:
  • HDF5Array: A high-level class HDF5Array that extends DelayedArray. All the operations available on DelayedArray objects work on HDF5Array objects.

  • HDF5Dataset: A low-level class for pointing to an HDF5 dataset. No operation can be performed directly on an HDF5Dataset object. It needs to be wrapped in a DelayedArray or HDF5Array object first. An HDF5Array object is just an HDF5Dataset object wrapped in a DelayedArray object.

Usage

## Constructor functions HDF5Array(file, name, type=NA) HDF5Dataset(file, name, type=NA)

Arguments

file
The path (as a single character string) to the HDF5 file where the dataset is located.

file can also be a DelayedArray object or an ordinary array, in which case, the object is written to disk as a new HDF5 dataset. If file is a DelayedArray object, all the delayed operations carried by the object are executed before the result is written to disk. This is the standard way to realize a DelayedArray object on disk. See ?DelayedArray for more information.

name
The name of the dataset in the HDF5 file.
type
NA or the R atomic type (specified as a single string) corresponding to the type of the HDF5 dataset.

Value

An HDF5Array object for HDF5Array().An HDF5Dataset object for HDF5Dataset().

Details

HDF5Array and HDF5Dataset can be used either to point to an existing HDF5 dataset or to create a new one (see description of the file argument above).

When used to create a new HDF5 dataset, the location where to write the dataset can be controlled with the setHDF5DumpFile and setHDF5DumpName utility functions.

See Also

  • DelayedArray objects.

  • DelayedArray-utils for common operations on DelayedArray objects.

  • setHDF5DumpFile to control the location of the new HDF5 datasets created by HDF5Array and HDF5Dataset.

  • h5ls in the rhdf5 package.

  • The rhdf5 package on top of which HDF5Array objects are implemented.

  • array objects in base R.

Examples

Run this code
## ---------------------------------------------------------------------
## CONSTRUCTION
## ---------------------------------------------------------------------
library(rhdf5)
library(h5vcData)

tally_file <- system.file("extdata", "example.tally.hfs5",
                          package="h5vcData")
h5ls(tally_file)

## Pick up "Coverages" dataset for Human chromosome 16:
cov0 <- HDF5Array(tally_file, "/ExampleStudy/16/Coverages")
cov0

## ---------------------------------------------------------------------
## dim/dimnames
## ---------------------------------------------------------------------
dim(cov0)

dimnames(cov0)
dimnames(cov0) <- list(paste0("s", 1:6), c("+", "-"))
dimnames(cov0)

## ---------------------------------------------------------------------
## SLICING (A.K.A. SUBSETTING)
## ---------------------------------------------------------------------
cov1 <- drop(cov0[ , , 29000001:29000007])
cov1

dim(cov1)
as.array(cov1)
stopifnot(identical(dim(as.array(cov1)), dim(cov1)))
stopifnot(identical(dimnames(as.array(cov1)), dimnames(cov1)))

cov2 <- drop(cov0[ , "+", 29000001:29000007])
cov2
as.matrix(cov2)

## ---------------------------------------------------------------------
## DelayedMatrix OBJECTS AS ASSAYS OF A SummarizedExperiment OBJECT
## ---------------------------------------------------------------------
library(SummarizedExperiment)

pcov <- drop(cov0[ , 1, ])  # coverage on plus strand
mcov <- drop(cov0[ , 2, ])  # coverage on minus strand

nrow(pcov)  # nb of samples
ncol(pcov)  # length of Human chromosome 16

## The convention for a SummarizedExperiment object is to have 1 column
## per sample so first we need to transpose 'pcov' and 'mcov':
pcov <- t(pcov)
mcov <- t(mcov)
se <- SummarizedExperiment(list(pcov=pcov, mcov=mcov))
se
stopifnot(validObject(se, complete=TRUE))

## A GPos object can be used to represent the genomic positions along
## the dataset:
gpos <- GPos(GRanges("16", IRanges(1, nrow(se))))
gpos
rowRanges(se) <- gpos
se
stopifnot(validObject(se))

Run the code above in your browser using DataLab