Runs Stanford CoreNLP on a collection of documents
corenlp(documents = NULL, document_directory = NULL, file_list = NULL,
delete_intermediate_files = TRUE, syntactic_parsing = FALSE,
coreference_resolution = FALSE, additional_options = "",
return_raw_output = FALSE, version = "3.5.2", block = 1)
An optional list of character vectors or a vector of strings, with one entry per dcument. These documents will be run through CoreNLP.
An optional directory path to a directory contianing only .txt files (one per document) to be run through CoreNLP. Cannot be supplied in addition to the 'documents' argument.
An optional list of .txt files to be used if document_directory option is specified. Can be useful if the user only wants to process a subset of documents in the directory such as when the corpus is extremely large.
Logical indicating whether intermediate files produced by CoreNLP should be deleted. Defaults to TRUE, but can be set to FALSE and the xml output of CoreNLP will be saved.
Logical indicating whether syntactic parsing should be included as an option. Defaults to FALSE. Caution, enabling this argument may greatly increase runtime. If TRUE, output will automatically be return in raw format.
Logical indicating whether coreference resolution should be included as an option. Defaults to FALSE. Caution, enabling this argument may greatly increase runtime. If TRUE, output will automatically be return in raw format.
An optional string specifying additional options for CoreNLP. May cause unexpected behavior, use at your own risk!
Defaults to FALSE, if TRUE, then CoreNLP output is not parsed and raw list objects are returned.
The version of Core-NLP to download. Defaults to '3.5.2'. Newer versions of CoreNLP will be made available at a later date.
An internal file list identifier used by corenlp_blocked() to avoid collisions. Should not be set by the user.
Returns a list of data.frame objects, one per document, where each row is a token observation (in order)
# NOT RUN {
directory <- system.file("extdata", package = "SpeedReader")[1]
Tokenized <- corenlp(
document_directory = directory,
syntactic_parsing = FALSE,
coreference_resolution =FALSE)
# }
Run the code above in your browser using DataLab