Runs the clean_nlp annotators over a given corpus of text
using either the R, Java, or Python backend. The details for
which annotators to run and how to run them are specified
by using one of: cnlp_init_tokenizers
,
cnlp_init_spacy
, cnlp_init_udpipe
,
or cnlp_init_corenlp
.
cnlp_annotate(input, as_strings = NULL, doc_ids = NULL,
backend = NULL, meta = NULL, doc_var = "doc_id",
text_var = "text")
either a vector of file names to parse, a character vector with one document in each element, or a data frame. If a data frame, specify what column names contain the text and (optionally) document ids
logical. Is the data given to input
the
actual document text or are they file names?
If NULL
, the default, will be set to
FALSE
if the input points to a valid
file and TRUE
otherwise.
optional character vector of document names
which backend to use. Will default to the last model to be initalized.
an optional data frame to bind to the document table
if passing a data frame, character description of the column containing the document identifier; if this this variable does not exist in the dataset, automatic names will be given (or set to NULL to force automatic names)
if passing a data frame, which column contains the document identifier
an object of class annotation
Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford corenlp Natural Language Processing Toolkit. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60.
# NOT RUN {
annotation <- cnlp_annotate("path/to/corpus/directory")
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab