Learn R Programming

cwbtools (version 0.4.1)

Tools to Create, Modify and Manage 'CWB' Corpora

Description

The 'Corpus Workbench' ('CWB', ) offers a classic and mature approach for working with large, linguistically and structurally annotated corpora. The 'CWB' is memory efficient and its design makes running queries fast, see Evert (2011) . The 'cwbtools' package offers pure 'R' tools to create indexed corpus files as well as high-level wrappers for the original 'C' implementation of 'CWB' as exposed by the 'RcppCWB' package (). Additional functionality to add and modify annotations of corpora from within 'R' makes working with 'CWB' indexed corpora much more flexible and convenient. The 'cwbtools' package in combination with the 'R' packages 'RcppCWB' () and 'polmineR' () offers a lightweight infrastructure to support the combination of quantitative and qualitative approaches for working with textual data.

Copy Link

Version

Install

install.packages('cwbtools')

Monthly Downloads

195

Version

0.4.1

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Andreas Blaette

Last Published

March 29th, 2024

Functions in cwbtools (0.4.1)

s_attribute_encode

Read, process and write data on structural attributes.
pkg_utils

Create and manage packages with corpus data.
registry_file_parse

Parse and create registry files.
zenodo_get_tarball

Download corpus tarball from Zenodo
encode

Encode CWB Corpus.
cwb_install

Utilities to install the Corpus Workbench (CWB)
conll_get_regions

Extract regions from NER annotations (CoNNL format).
cwbtools-package

cwbtools-package
cwb_corpus_dir

Manage directories for indexed corpora
get_encoding

Get Encoding of Character Vector.
p_attribute_encode

Encode Positional Attribute(s).
corpus_install

Install and manage corpora.
as.vrt

Consolidate vrt files for CWB import.
CorpusData

Manage Corpus Data and Encode CWB Corpus.