This object contains public and private methods which may be useful for every large data sets. Objects of this class are not intended to be used directly. LargeDataSetForTextEmbeddings or LargeDataSetForText.
Returns a new object of this class.
n_cols()
Number of columns in the data set.
LargeDataSetBase$n_cols()
int
describing the number of columns in the data set.
n_rows()
Number of rows in the data set.
LargeDataSetBase$n_rows()
int
describing the number of rows in the data set.
get_colnames()
Get names of the columns in the data set.
LargeDataSetBase$get_colnames()
vector
containing the names of the columns as string
s.
get_dataset()
Get data set.
LargeDataSetBase$get_dataset()
Returns the data set of this object as an object of class datasets.arrow_dataset.Dataset
.
reduce_to_unique_ids()
Reduces the data set to a data set containing only unique ids. In the case an id exists multiple times in the data set the first case remains in the data set. The other cases are dropped.
Attention Calling this method will change the data set in place.
LargeDataSetBase$reduce_to_unique_ids()
Method does not return anything. It changes the data set of this object in place.
select()
Returns a data set which contains only the cases belonging to the specific indices.
LargeDataSetBase$select(indicies)
indicies
vector
of int
for selecting rows in the data set. Attention The indices are zero-based.
Returns a data set of class datasets.arrow_dataset.Dataset
with the selected rows.
get_ids()
Get ids
LargeDataSetBase$get_ids()
Returns a vector
containing the ids of every row as string
s.
save()
Saves a data set to disk.
LargeDataSetBase$save(dir_path, folder_name, create_dir = TRUE)
dir_path
Path where to store the data set.
folder_name
string
Name of the folder for storing the data set.
create_dir
bool
If True
the directory will be created if it does not exist.
Method does not return anything. It write the data set to disk.
load_from_disk()
loads an object of class LargeDataSetBase from disk 'and updates the object to the current version of the package.
LargeDataSetBase$load_from_disk(dir_path)
dir_path
Path where the data set set is stored.
Method does not return anything. It loads an object from disk.
dir_path
Path where the data set is stored.
Method does not return anything. It loads a data set from disk.
get_all_fields()
Return all fields.
LargeDataSetBase$get_all_fields()
Method returns a list
containing all public and private fields of the object.
clone()
The objects of this class are cloneable with this method.
LargeDataSetBase$clone(deep = FALSE)
deep
Whether to make a deep clone.