Learn R Programming

⚠️There's a newer version (0.10.2) of this package.Take me there.

corpus (version 0.8.0)

Text Corpus Analysis

Description

Text corpus data analysis, with full support for Unicode. Functions for reading data from newline-delimited JSON files, for normalizing and tokenizing text, for searching for term occurrences, and for computing term occurrence frequencies (including n-grams).

Copy Link

Version

Install

install.packages('corpus')

Monthly Downloads

278

Version

0.8.0

License

Apache License (== 2.0) | file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Patrick Perry

Last Published

July 19th, 2017

Functions in corpus (0.8.0)

term_matrix

Term Frequency Tabulation
text_filter

Text Filters
utf8

UTF-8 Text Handling
federalist

The Federalist Papers
read_ndjson

JSON Data Input
text_tokens

Text Tokenization
text_types

Text Type Sets.
stopwords

Stop Words
term_counts

Term Frequencies
corpus-deprecated

Deprecated Functions in Package corpus
corpus-package

The Corpus Package
text_locate

Searching for terms in text.
text_split

Segmenting Text
abbreviations

Abbreviations
as_text

Text Vectors