This package provides a tm Source to create corpora from files formatted in the format used by the Alceste application.
Milan Bouchet-Valat <nalimilan@club.fr>
Typical usage is to create a corpus from an Alceste file
prepared manually (here called myAlcesteCorpus.txt
).
Frequently, it is necessary to specify the encoding of the texts
via link{AlcesteSource}
's encoding
argument.
# Import corpus
source <- europresseSource("myAlcesteCorpus.txt")
corpus <- Corpus(source) # See how many articles were imported
corpus
# See the contents of the first article and its meta-data
inspect(corpus[1])
meta(corpus[[1]])
See link{AlcesteSource}
for more details and real examples.