Learn R Programming

arrow (version 8.0.0)

map_batches: Apply a function to a stream of RecordBatches

Description

As an alternative to calling collect() on a Dataset query, you can use this function to access the stream of RecordBatches in the Dataset. This lets you aggregate on each chunk and pull the intermediate results into a data.frame for further aggregation, even if you couldn't fit the whole Dataset result in memory.

Usage

map_batches(X, FUN, ..., .data.frame = TRUE)

Arguments

X

A Dataset or arrow_dplyr_query object, as returned by the dplyr methods on Dataset.

FUN

A function or purrr-style lambda expression to apply to each batch

...

Additional arguments passed to FUN

.data.frame

logical: collect the resulting chunks into a single data.frame? Default TRUE

Details

This is experimental and not recommended for production use.