Learn R Programming

High-performance I/O tools for R

Anyone dealing with large data knows that stock tools in R are bad at loading (non-binary) data to R. This package started as an attempt to provide high-performance parsing tools that minimize copying and avoid the use of strings when possible (see mstrsplit, for example).

To allow processing of arbitrarily large files we have added way to process chunk-wise input, making it possible to compute on streaming input as well as very large files (see chunk.reader and chunk.apply).

The next natural progress was to wrap support for Hadoop streaming. The major goal was to make it possible to compute using Hadoop Map Reduce by writing code that is very natural - very much like using lapply on data chunks without the need to know anything about Hadoop. See the WiKi page for the idea and hmr function for the documentation.

Copy Link

Version

Install

install.packages('iotools')

Monthly Downloads

1,493

Version

0.3-5

License

GPL-2 | GPL-3

Maintainer

Simon Urbanek

Last Published

December 2nd, 2023

Functions in iotools (0.3-5)

dstrfw

Split fixed width input into a dataframe
readAsRaw

Read binary data in as raw
.default.formatter

Default formatter, coorisponding to the as.output functions
read.csv.raw

Fast data frame input
imstrsplit

Create an iterator for splitting binary or character input into a matrix
input.file

Load a file on the disk
idstrsplit

Create an iterator for splitting binary or character input into a dataframe
line.merge

Merge multiple sources
fdrbind

Fast row-binding of lists and data frames
chunk.map

Map a function over a file by chunks
chunk.apply

Process input by applying a function to each chunk
as.output

Character Output
mstrsplit

Split binary or character input into a matrix
chunk

Functions for very fast chunk-wise processing
output.file

Write an R object to a file as a character string
dstrsplit

Split binary or character input into a dataframe
which.min.key

Determine the next key in bytewise order
write.csv.raw

Fast data output to disk
ctapply

Fast tapply() replacement functions