Read files into a dataset, optionally processing them in parallel.
read_files(
files,
reader,
...,
parallel_files = 1,
parallel_interleave = 1,
num_shards = NULL,
shard_index = NULL
)
A dataset
List of filenames or glob pattern for files (e.g. "*.csv")
Function that maps a file into a dataset (e.g.
text_line_dataset()
or tfrecord_dataset()
).
Additional arguments to pass to reader
function
An integer, number of files to process in parallel
An integer, number of consecutive records to produce from each file before cycling to another file.
An integer representing the number of shards operating in parallel.
An integer, representing the worker index. Shared indexes are 0 based so for e.g. 8 shards valid indexes would be 0-7.