dataset
All datasets that represent a map from keys to data samples should subclass this
class. All subclasses should overwrite the .getitem()
method, which supports
fetching a data sample for a given key. Subclasses could also optionally
overwrite .length()
, which is expected to return the size of the dataset
(e.g. number of samples) used by many sampler implementations
and the default options of dataloader()
.
dataset(
name = NULL,
inherit = Dataset,
...,
private = NULL,
active = NULL,
parent_env = parent.frame()
)
The output is a function f
with class dataset_generator
. Calling f()
creates a new instance of the R6 class dataset
. The R6 class is stored in the
enclosing environment of f
and can also be accessed through f
s attribute
Dataset
.
a name for the dataset. It it's also used as the class for it.
you can optionally inherit from a dataset when creating a new dataset.
public methods for the dataset class
passed to R6::R6Class()
.
passed to R6::R6Class()
.
An environment to use as the parent of newly-created objects.
By default datasets are iterated by returning each observation/item individually.
Often it's possible to have an optimized implementation to take a batch
of observations (eg, subsetting a tensor by multiple indexes at once is faster than
subsetting once for each index), in this case you can implement a .getbatch
method
that will be used instead of .getitem
when getting a batch of observations within
the dataloader. .getbatch
must work for batches of size larger or equal to 1. For more
on this see the the vignette("loading-data")
.