A Dataset can constructed using one or more DatasetFactorys.
This function helps you construct a DatasetFactory
that you can pass to
open_dataset()
.
dataset_factory(
x,
filesystem = c("auto", "local"),
format = c("parquet", "arrow", "ipc", "feather"),
partitioning = NULL,
allow_not_found = FALSE,
recursive = TRUE,
...
)
A string file x containing data files, or
a list of DatasetFactory
objects whose datasets should be
grouped. If this argument is specified it will be used to construct a
UnionDatasetFactory
and other arguments will be ignored.
A string identifier for the filesystem corresponding to
x
. Currently only "local" is supported.
A string identifier of the format of the files in x
.
Currently "parquet" and "ipc"/"arrow"/"feather" (aliases for each other)
are supported. For Feather, only version 2 files are supported.
One of
A Schema
, in which case the file paths relative to sources
will be
parsed, and path segments will be matched with the schema fields. For
example, schema(year = int16(), month = int8())
would create partitions
for file paths like "2019/01/file.parquet", "2019/02/file.parquet", etc.
A character vector that defines the field names corresponding to those
path segments (that is, you're providing the names that would correspond
to a Schema
but the types will be autodetected)
A HivePartitioning
or HivePartitioningFactory
, as returned
by hive_partition()
which parses explicit or autodetected fields from
Hive-style path segments
NULL
for no partitioning
logical: is x
allowed to not exist? Default
FALSE
. See FileSelector.
logical: should files be discovered in subdirectories of
x
? Default TRUE
.
Additional arguments passed to the FileSystem $create()
method
A DatasetFactory
object. Pass this to open_dataset()
,
in a list potentially with other DatasetFactory
objects, to create
a Dataset
.
If you would only have a single DatasetFactory
(for example, you have a
single directory containing Parquet files), you can call open_dataset()
directly. Use dataset_factory()
when you
want to combine different directories, file systems, or file formats.