Runs Stanford CoreNLP on a collection of .txt files and processes them in blocks of a specified size, saving intermediate results to disk. Designed to function on very large corpora.
corenlp_blocked(output_directory, document_directory, file_list = NULL,
block_size = 1000, syntactic_parsing = FALSE,
coreference_resolution = FALSE, additional_options = "",
return_raw_output = FALSE, version = "3.5.2", parallel = FALSE,
cores = 1, first_block = NULL, last_block = NULL)
The path to a directory where the user would like CoreNLP output to be stored. Output will be saved to this directory in .Rdata files named CoreNLP_Output_1.Rdata ... CoreNLP_Output_N.Rdata
A directory path to a directory contianing .txt files (one per document) to be run through CoreNLP.
An optional list of .txt files to be used. Can be useful if the user only wants to process a subset of documents in the directory such as when the corpus is extremely large.
The number of docuemnts to be processed at a time. Defaults to 1000.
Logical indicating whether syntactic parsing should be included as an option. Defaults to FALSE. Caution, enabling this argument may greatly increase runtime. If TRUE, output will automatically be return in raw format.
Logical indicating whether coreference resolution should be included as an option. Defaults to FALSE. Caution, enabling this argument may greatly increase runtime. If TRUE, output will automatically be return in raw format.
An optional string specifying additional options for CoreNLP. May cause unexpected behavior, use at your own risk!
Defaults to FALSE, if TRUE, then CoreNLP output is not parsed and raw list objects are returned.
The version of Core-NLP to download. Defaults to '3.5.2'. Newer versions of CoreNLP will be made available at a later date.
Logical indicating whether CoreNLP should be run in parallel.
The number of cores to be used if CoreNLP is being run in parallel.
Used to run CoreNLP on certain block ranges.
Used to run CoreNLP on certain block ranges.
Does not return anything, saves all output to disk.