EXPERIMENTAL. Batched version of load_tweets_json with control over retained columns. Not as efficient as load_tweets_json but requires less memory. Wrapper of the function fload
load_many_tweets_json(
data_dir,
batch_size = 1000,
keep_cols = c("text", "possibly_sensitive", "public_metrics", "lang",
"edit_history_tweet_ids", "attachments", "geo"),
query = NULL,
query_error_ok = TRUE
)
a data.table with all tweets loaded
string that leads to the directory containing JSON files
integer specifying the number of JSON files
to load per batch. Default: 1000
character vector with the names of columns you want to
keep. Set it to NULL
to only retain the required columns.
Default: keep_cols = c("text", "possibly_sensitive", "public_metrics",
"lang", "edit_history_tweet_ids", "attachments", "geo")
(string) JSON Pointer query passed on to
fload (optional). Default: NULL
(Boolean) stop if query
causes an error. Passed on
to fload (optional). Default: FALSE
Unlike load_tweets_json this function loads JSON files
in batches and processes each batch before loading the next batch.
You can specify which columns to keep, which in turn requires less memory.
For example, you can decide not to keep the "text
column, which
requires quite a lot of memory.