A FileFormat
holds information about how to read and parse the files
included in a Dataset
. There are subclasses corresponding to the supported
file formats (ParquetFileFormat
and IpcFileFormat
).
FileFormat$create()
takes the following arguments:
format
: A string identifier of the format of the files in path
.
Currently "parquet" and "ipc"/"arrow"/"feather" (aliases for each other)
are supported. For Feather, only version 2 files are supported.
...
: Additional format-specific options
format="parquet":
use_buffered_stream
: Read files through buffered input streams rather than
loading entire row groups at once. This may be enabled
to reduce memory overhead. Disabled by default.
buffer_size
: Size of buffered stream, if enabled. Default is 8KB.
dict_columns
: Names of columns which should be read as dictionaries.
It returns the appropriate subclass of FileFormat
(e.g. ParquetFileFormat
)