Learn R Programming

⚠️There's a newer version (0.13-8) of this package.Take me there.

koRpus (version 0.04-40)

An R Package for Text Analysis

Description

A set of tools to analyze texts. Includes, amongst others, functions for automatic language detection, hyphenation, several indices of lexical diversity (e.g., type token ratio, HD-D/vocd-D, MTLD) and readability (e.g., Flesch, SMOG, LIX, Dale-Chall). Basic import functions for language corpora are also provided, to enable frequency analyses (supports Celex and Leipzig Corpora Collection file formats). #' Note: For full functionality a local installation of TreeTagger is recommended. Be encouraged to send feedback to the author(s)!

Copy Link

Version

Install

install.packages('koRpus')

Monthly Downloads

3,715

Version

0.04-40

License

GPL (>= 3)

Maintainer

Meik Michalke

Last Published

April 7th, 2013

Functions in koRpus (0.04-40)

SMOG

Readability: Simple Measure of Gobbledygook (SMOG)
coleman.liau

Readability: Coleman-Liau Index
FOG

Readability: Gunning FOG Index
HDD

Lexical diversity: HD-D (vocd-d)
kRp.TTR-class

S4 class kRp.TTR
show

Show methods for koRpus objects
flesch.kincaid

Readability: Flesch-Kincaid Grade Level
ELF

Readability: Farr's Easy Listening Formula (ELF)
CTTR

Lexical diversity: Carroll's corrected TTR (CTTR)
kRp.analysis-class

S4 class kRp.analysis
hyphen

Automatic hyphenation
lex.div.num

Calculate lexical diversity
flesch

Readability: Flesch Readability Ease
readability.num

Calculate readability
U.ld

Lexical diversity: Uber Index (U)
readability

Measure readability
kRp.lang-class

S4 class kRp.lang
hyph.XX

Hyphenation patterns
set.kRp.env

A function to set information on your koRpus environmenton
dale.chall

Readability: Dale-Chall Readability Formula
spache

Readability: Spache Formula
guess.lang

Guess language a text is written in
MATTR

Lexical diversity: Moving-Average Type-Token Ratio (MATTR)
treetag

A function to call TreeTagger
C.ld

Lexical diversity: Herdan's C
RIX

Readability: Anderson's Readability Index (RIX)
kRp.POS.tags

Get elaborated word tag definitions
R.ld

Lexical diversity: Guiraud's R
kRp.hyph.pat-class

S4 class kRp.hyph.pat
maas

Lexical diversity: Maas' indices
koRpus-package

The koRpus Package
K.ld

Lexical diversity: Yule's K
kRp.txt.trans-class

S4 class kRp.txt.trans
coleman

Readability: Coleman's Formulas
dickes.steiwer

Readability: Dickes-Steiwer Handformel
TRI

Readability: Kuntzsch's Text-Redundanz-Index
S.ld

Lexical diversity: Summer's S
LIX

Readability: Bj"ornsson's L"asbarhetsindex (LIX)
kRp.text.transform

Letter case transformation
clozeDelete

Transform text into cloze test format
kRp.hyphen-class

S4 class kRp.hyphen
lex.div

Analyze lexical diversity
cTest

Transform text into C-Test-like format
kRp.tagged-class

S4 class kRp.tagged
kRp.filter.wclass

Remove word classes
DRP

Readability: Degrees of Reading Power (DRP)
tokenize

A simple tokenizer
MSTTR

Lexical diversity: Mean Segmental Type-Token Ratio (MSTTR)
jumbleWords

Produce jumbled words
get.kRp.env

Get koRpus session environment
wheeler.smith

Readability: Wheeler-Smith Score
correct.tag

Methods to correct koRpus objects
textFeatures

Extract text features for authorship analysis
read.corp.custom

Import custom corpus data
kRp.cluster

Work in (early) progress. Probably don't even look at it. Consider it pure magic that is not to be tempered with.
read.hyph.pat

Reading patgen-compatible hyphenation pattern files
kRp.readability-class

S4 class kRp.readability
FORCAST

Readability: FORCAST Index
ARI

Readability: Automated Readability Index (ARI)
kRp.corp.freq-class

S4 class kRp.corp.freq
kRp.text.analysis

Analyze texts using TreeTagger and word frequencies
MTLD

Lexical diversity: Measure of Textual Lexical Diversity (MTLD)
nWS

Readability: Neue Wiener Sachtextformeln
read.corp.LCC

Import LCC data
strain

Readability: Strain Index
kRp.text.paste

Paste koRpus objects
bormuth

Readability: Bormuth's Mean Cloze and Grade Placement
freq.analysis

Analyze word frequencies
harris.jacobson

Readability: Harris-Jacobson indices
manage.hyph.pat

Handling hyphenation pattern objects
query

A method to get information out of koRpus objects
linsear.write

Readability: Linsear Write Index
danielson.bryan

Readability: Danielson-Bryan
summary

Summary methods for koRpus objects
segment.optimizer

A function to optimize MSTTR segment sizes
read.corp.celex

Import Celex data
TTR

Lexical diversity: Type-Token Ratio
fucks

Readability: Fucks' Stilcharakteristik
plot

Plot method for objects of class kRp.tagged
kRp.txt.freq-class

S4 class kRp.txt.freq
farr.jenkins.paterson

Readability: Farr-Jenkins-Paterson Index
taggedText

Getter/setter methods for koRpus objects
traenkle.bailer

Readability: Traenkle-Bailer Formeln