This package provides a tm Source to create corpora from articles exported from Dow Jones's Factiva content provider as XML or HTML files.
Milan Bouchet-Valat <nalimilan@club.fr>
Typical usage is to create a corpus from a XML or HTML files
exported from Factiva (here called myFactivaArticles.xml
). Setting
language=NA
allows the language to be set automatically from the
information provided by Factiva:
# Import corpus
source <- FactivaSource("myFactivaArticles.xml")
corpus <- Corpus(source, list(language=NA)) # See how many articles were imported
corpus
# See the contents of the first article and its meta-data
inspect(corpus[1])
meta(corpus[[1]])
Currently, only HTML files saved in French are supported. Please send the maintainer examples of Factiva files in your language if you want it to be supported.
See FactivaSource
for more details and real examples.