Transform a dataset with named columns into a list with features (x
) and
response (y
) elements.
dataset_prepare(
dataset,
x,
y = NULL,
named = TRUE,
named_features = FALSE,
parallel_records = NULL,
batch_size = NULL,
num_parallel_batches = NULL,
drop_remainder = FALSE
)
A dataset. The dataset will have a structure of either:
When named_features
is TRUE
: list(x = list(feature_name = feature_values, ...), y = response_values)
When named_features
is FALSE
: list(x = features_array, y = response_values)
,
where features_array
is a Rank 2 array of (batch_size, num_features)
.
Note that the y
element will be omitted when y
is NULL
.
A dataset
Features to include. When named_features
is FALSE
all features
will be stacked into a single tensor so must have an identical data type.
(Optional). Response variable.
TRUE
to name the dataset elements "x" and "y", FALSE
to
not name the dataset elements.
TRUE
to yield features as a named list; FALSE
to
stack features into a single array. Note that in the case of FALSE
(the
default) all features will be stacked into a single 2D tensor so need to
have the same underlying data type.
(Optional) An integer, representing the number of records to decode in parallel. If not specified, records will be processed sequentially.
(Optional). Batch size if you would like to fuse the
dataset_prepare()
operation together with a dataset_batch()
(fusing
generally improves overall training performance).
(Optional) An integer, representing the number of batches to create in parallel. On one hand, higher values can help mitigate the effect of stragglers. On the other hand, higher values can increase contention if CPU is scarce.
(Optional.) A boolean, representing whether the last
batch should be dropped in the case it has fewer than batch_size
elements; the default behavior is not to drop the smaller batch.
input_fn() for use with tfestimators.