A record batch is a collection of equal-length arrays matching a particular Schema. It is a table-like data structure that is semantically a sequence of fields, each a contiguous Arrow Array.
record_batch(..., schema = NULL)A data.frame or a named set of Arrays or vectors. If given a
mixture of data.frames and vectors, the inputs will be autospliced together
(see examples). Alternatively, you can provide a single Arrow IPC
InputStream, Message, Buffer, or R raw object containing a Buffer.
a Schema, or NULL (the default) to infer the schema from
the data in .... When providing an Arrow IPC buffer, schema is required.
Record batches are data-frame-like, and many methods you expect to work on
a data.frame are implemented for RecordBatch. This includes [, [[,
$, names, dim, nrow, ncol, head, and tail. You can also pull
the data from an Arrow record batch into R with as.data.frame(). See the
examples.
A caveat about the $ method: because RecordBatch is an R6 object,
$ is also used to access the object's methods (see below). Methods take
precedence over the table's columns. So, batch$Slice would return the
"Slice" method function even if there were a column in the table called
"Slice".
In addition to the more R-friendly S3 methods, a RecordBatch object has
the following R6 methods that map onto the underlying C++ methods:
$Equals(other): Returns TRUE if the other record batch is equal
$column(i): Extract an Array by integer position from the batch
$column_name(i): Get a column's name by integer position
$names(): Get all column names (called by names(batch))
$GetColumnByName(name): Extract an Array by string name
$RemoveColumn(i): Drops a column from the batch by integer position
$selectColumns(indices): Return a new record batch with a selection of columns, expressed as 0-based integers.
$Slice(offset, length = NULL): Create a zero-copy view starting at the
indicated integer offset and going for the given length, or to the end
of the table if NULL, the default.
$Take(i): return an RecordBatch with rows at positions given by
integers (R vector or Array Array) i.
$Filter(i, keep_na = TRUE): return an RecordBatch with rows at positions where logical
vector (or Arrow boolean Array) i is TRUE.
$serialize(): Returns a raw vector suitable for interprocess communication
$cast(target_schema, safe = TRUE, options = cast_options(safe)): Alter
the schema of the record batch.
There are also some active bindings
$num_columns
$num_rows
$schema
$metadata: Returns the key-value metadata of the Schema as a named list.
Modify or replace by assigning in (batch$metadata <- new_metadata).
All list elements are coerced to string.
$columns: Returns a list of Arrays
# NOT RUN {
batch <- record_batch(name = rownames(mtcars), mtcars)
dim(batch)
dim(head(batch))
names(batch)
batch$mpg
batch[["cyl"]]
as.data.frame(batch[4:8, c("gear", "hp", "wt")])
# }
Run the code above in your browser using DataLab