Learn R Programming

arrow (version 0.17.1)

FileFormat: Dataset file formats

Description

A FileFormat holds information about how to read and parse the files included in a Dataset. There are subclasses corresponding to the supported file formats (ParquetFileFormat and IpcFileFormat).

Arguments

Factory

FileFormat$create() takes the following arguments:

  • format: A string identifier of the format of the files in path. Currently "parquet" and "ipc"/"arrow"/"feather" (aliases for each other) are supported. For Feather, only version 2 files are supported.

  • ...: Additional format-specific options format="parquet":

    • use_buffered_stream: Read files through buffered input streams rather than loading entire row groups at once. This may be enabled to reduce memory overhead. Disabled by default.

    • buffer_size: Size of buffered stream, if enabled. Default is 8KB.

    • dict_columns: Names of columns which should be read as dictionaries.

It returns the appropriate subclass of FileFormat (e.g. ParquetFileFormat)