Tools to Create, Modify and Manage 'CWB' Corpora
Description
The 'Corpus Workbench' ('CWB', ) offers a classic and mature
approach for working with large, linguistically and structurally annotated corpora. The 'CWB'
is memory efficient and its design makes running queries fast (Evert and Hardie 2011,
). The 'cwbtools' package offers
pure R tools to create indexed corpus files as well as high-level wrappers for the original C
implementation of CWB as exposed by the 'RcppCWB' package
. Additional functionality to add and
modify annotations of corpora from within R makes working with CWB indexed corpora
much more flexible and convenient. The 'cwbtools' package in combination with the R packages
'RcppCWB' () and 'polmineR'
() offers a lightweight infrastructure
to support the combination of quantitative and qualitative approaches for working
with textual data.