Learn R Programming

R interface to TensorFlow Dataset API

The TensorFlow Dataset API provides various facilities for creating scalable input pipelines for TensorFlow models, including:

  • Reading data from a variety of formats including CSV files and TFRecords files (the standard binary format for TensorFlow training data).

  • Transforming datasets in a variety of ways including mapping arbitrary functions against them.

  • Shuffling, batching, and repeating datasets over a number of epochs.

  • Streaming interface to data for reading arbitrarily large datasets.

  • Reading and transforming data are TensorFlow graph operations, so are executed in C++ and in parallel with model training.

The R interface to TensorFlow datasets provides access to the Dataset API, including high-level convenience functions for easy integration with the keras package.

For documentation on using tfdatasets, see the package website at https://tensorflow.rstudio.com/tools/tfdatasets/.

Copy Link

Version

Install

install.packages('tfdatasets')

Monthly Downloads

2,204

Version

2.17.0

License

Apache License 2.0

Issues

Pull Requests

Stars

Forks

Last Published

July 16th, 2024

Functions in tfdatasets (2.17.0)

as_array_iterator

Convert tf_dataset to an iterator that yields R arrays.
as_tf_dataset

Add the tf_dataset class to a dataset
all_nominal

Find all nominal variables.
as_tensor.tensorflow.python.data.ops.dataset_ops.DatasetV2

Get the single element of the dataset.
choose_from_datasets

Creates a dataset that deterministically chooses elements from datasets.
dataset_cache

Caches the elements in this dataset.
all_numeric

Speciy all numeric variables.
dataset_interleave

Maps map_func across this dataset, and interleaves the results
dataset_enumerate

Enumerates the elements of this dataset
dataset_unique

A transformation that discards duplicate elements of a Dataset.
dataset_map

Map a function across a dataset.
dataset_concatenate

Creates a dataset by concatenating given dataset with this dataset.
dataset_collect

Collects a dataset
dataset_use_spec

Transform the dataset using the provided spec.
dataset_decode_delim

Transform a dataset with delimted text lines into a dataset with named columns
dataset_padded_batch

Combines consecutive elements of this dataset into padded batches.
dataset_take_while

A transformation that stops dataset iteration based on a predicate.
dataset_prefetch

Creates a Dataset that prefetches elements from this dataset.
hearts

Heart Disease Data Set
dataset_map_and_batch

Fused implementation of dataset_map() and dataset_batch()
dataset_unbatch

Unbatch a dataset
input_fn.tf_dataset

Construct a tfestimators input function from a dataset
layer_input_from_dataset

Creates a list of inputs from a dataset
file_list_dataset

A dataset of all files matching a pattern
fit.FeatureSpec

Fits a feature specification.
dataset_prefetch_to_device

A transformation that prefetches dataset values to the given device
length.tf_dataset

Get Dataset length
reexports

Objects exported from other packages
dataset_options

Get or Set Dataset Options
dataset_prepare

Prepare a dataset for analysis
dataset_window

Combines input elements into a dataset of windows.
delim_record_spec

Specification for reading a record from a text file with delimited values
dataset_shard

Creates a dataset that includes only 1 / num_shards of this dataset.
dataset_batch

Combines consecutive elements of this dataset into batches.
dataset_filter

Filter a dataset by a predicate
sample_from_datasets

Samples elements at random from the datasets in datasets.
dataset_reduce

Reduces the input dataset to a single element.
dataset_rejection_resample

A transformation that resamples a dataset to a target distribution.
dataset_snapshot

Persist the output of a dataset
dataset_take

Creates a dataset with at most count elements from this dataset
dataset_shuffle

Randomly shuffles the elements of this dataset.
step_embedding_column

Creates embeddings columns
dataset_flat_map

Maps map_func across this dataset and flattens the result.
dataset_bucket_by_sequence_length

A transformation that buckets elements in a Dataset by length
dataset_group_by_window

Group windows of elements by key and reduce them
step_indicator_column

Creates Indicator Columns
dataset_repeat

Repeats a dataset count times.
until_out_of_range

Execute code that traverses a dataset until an out of range condition occurs
iterator_get_next

Get next element from iterator
dense_features

Dense Features
with_dataset

Execute code that traverses a dataset
dataset_scan

A transformation that scans a function across an input dataset
dataset_shuffle_and_repeat

Shuffles and repeats a dataset returning a new permutation for each epoch.
fixed_length_record_dataset

A dataset of fixed-length records from one or more binary files.
dataset_skip

Creates a dataset that skips count elements from this dataset
iterator_initializer

An operation that should be run to initialize this iterator.
scaler_standard

Creates an instance of a standard scaler
has_type

Identify the type of the variable.
%>%

Pipe operator
make-iterator

Creates an iterator for enumerating the elements of this dataset.
feature_spec

Creates a feature specification.
next_batch

Tensor(s) for retrieving the next batch from a dataset
random_integer_dataset

Creates a Dataset of pseudorandom values
make_csv_dataset

Reads CSV files into a batched dataset
scaler

List of pre-made scalers
step_numeric_column

Creates a numeric column specification
selectors

Selectors
scaler_min_max

Creates an instance of a min max scaler
sparse_tensor_slices_dataset

Splits each rank-N tf$SparseTensor in this dataset row-wise.
iterator_make_initializer

Create an operation that can be run to initialize this iterator
step_categorical_column_with_identity

Create a categorical column with identity
step_remove_column

Creates a step that can remove columns
sql_record_spec

A dataset consisting of the results from a SQL query
step_categorical_column_with_vocabulary_list

Creates a categorical column specification
iterator_string_handle

String-valued tensor that represents this iterator
step_crossed_column

Creates crosses of categorical columns
text_line_dataset

A dataset comprising lines from one or more text files.
range_dataset

Creates a dataset of a step-separated range of values.
read_files

Read a dataset from a set of files
output_types

Output types and shapes
tensor_slices_dataset

Creates a dataset whose elements are slices of the given tensors.
tfrecord_dataset

A dataset comprising records from one or more TFRecord files.
step_categorical_column_with_vocabulary_file

Creates a categorical column with vocabulary file
zip_datasets

Creates a dataset by zipping together the given datasets.
step_bucketized_column

Creates bucketized columns
step_categorical_column_with_hash_bucket

Creates a categorical column with hash buckets specification
tensors_dataset

Creates a dataset with a single element, comprising the given tensors.
steps

Steps for feature columns specification.
step_shared_embeddings_column

Creates shared embeddings for categorical columns