Learn R Programming

⚠️There's a newer version (0.13-8) of this package.Take me there.

koRpus

koRpus is an R package for text analysis. This includes, amongst others, a wrapper for the POS tagger TreeTagger, functions for automatic language detection, hyphenation, several indices of lexical diversity (e.g., type token ratio, HD-D/vocd-D, MTLD) and readability (e.g., Flesch, SMOG, LIX, Dale-Chall, Tuldava).

koRpus also includes a plugin for RKWard, a powerful GUI and IDE for R, providing graphical dialogs for its basic features. To make full use of this feature, please install RKWard (plugins are detected automatically).

More information on koRpus is available on the project homepage.

Installation

There are three easy ways of getting koRpus:

Stable releases via CRAN

The latest release that is considered stable for productive work can be found on the CRAN mirrors, which means you can install it from a running R session like this:

install.packages("koRpus")

The CRAN packages are usually a bit behind the recent state of the package, and only updated after a significant amount of changes or important bug fixes.

Development releases via the project repository

Inbetween stable CRAN releases there's usually several testing or development versions released on the project's own repository. These releases should also work without problems, but they are also intended to test new features or supposed bug fixes, and get feedback before the next release goes to CRAN.

Installation is fairly easy, too:

install.packages("koRpus", repo=c(getOption("repos"), reaktanz="https://reaktanz.de/R"))

To automatically get updates, consider adding the repository to your R configuration. You might also want to subscribe to the package's RSS feed to get notified of new releases.

If you're running a Debian based operating system, you might be interested in the precompiled *.deb packages.

Installation via GitHub

To install it directly from GitHub, you can use install_github() from the devtools package:

library(devtools)
install_github("unDocUMeantIt/koRpus") # stable release
install_github("unDocUMeantIt/koRpus", ref="develop") # development release

Installing language support

koRpus does not support any particular language out-of-the-box. Therefore, after installing the package you'll have to also install at least one language support package to really make use of it. You can find these in the l10n repository, they're called koRpus.lang.*.

The most straight forward way to get these packages is to use the function install.koRpus.lang(). Here's an example how to install support for English and German:

library(koRpus)
install.koRpus.lang(lang=c("en", "de"))

There's also precompiled Debian packages.

Contributing

To ask for help, report bugs, suggest feature improvements, or discuss the global development of the package, please either subscribe to the koRpus-dev mailing list, or use the issue tracker on GitHub.

Branches

Please note that all development happens in the develop branch. Pull requests against the master branch will be rejected, as it is reserved for the current stable release.

Licence

Copyright 2012-2017 Meik Michalke meik.michalke@hhu.de

koRpus is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

koRpus is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with koRpus. If not, see http://www.gnu.org/licenses/.

Copy Link

Version

Install

install.packages('koRpus')

Monthly Downloads

3,715

Version

0.11-5

License

GPL (>= 3)

Maintainer

Meik Michalke

Last Published

October 28th, 2018

Functions in koRpus (0.11-5)

K.ld

Lexical diversity: Yule's K
MSTTR

Lexical diversity: Mean Segmental Type-Token Ratio (MSTTR)
RIX

Readability: Anderson's Readability Index (RIX)
flesch.kincaid

Readability: Flesch-Kincaid Grade Level
freq.analysis

Analyze word frequencies
S.ld

Lexical diversity: Summer's S
ARI

Readability: Automated Readability Index (ARI)
C.ld

Lexical diversity: Herdan's C
kRp.readability,-class

S4 Class kRp.readability
danielson.bryan

Readability: Danielson-Bryan
read.corp.celex

Import Celex data
kRp.tagged,-class

S4 Class kRp.tagged
dickes.steiwer

Readability: Dickes-Steiwer Handformel
cTest

Transform text into C-Test-like format
CTTR

Lexical diversity: Carroll's corrected TTR (CTTR)
read.corp.custom

Import custom corpus data
strain

Readability: Strain Index
summary

Summary methods for koRpus objects
clozeDelete

Transform text into cloze test format
types

Get types and tokens of a given text
fucks

Readability: Fucks' Stilcharakteristik
farr.jenkins.paterson

Readability: Farr-Jenkins-Paterson Index
flesch

Readability: Flesch Readability Ease
wheeler.smith

Readability: Wheeler-Smith Score
kRp.filter.wclass

Remove word classes
get.kRp.env

Get koRpus session settings
jumbleWords

Produce jumbled words
kRp.lang,-class

S4 Class kRp.lang
DRP

Readability: Degrees of Reading Power (DRP)
kRp.txt.trans,-class

S4 Class kRp.txt.trans
ELF

Readability: Fang's Easy Listening Formula (ELF)
koRpus-deprecated

Deprecated functions
readability.num

Calculate readability
kRp.POS.tags

Get elaborated word tag definitions
kRp.text.paste

Paste koRpus objects
kRp.txt.freq,-class

S4 Class kRp.txt.freq
read.BAWL

Import BAWL-R data
FOG

Readability: Gunning FOG Index
segment.optimizer

A function to optimize MSTTR segment sizes
MTLD

Lexical diversity: Measure of Textual Lexical Diversity (MTLD)
SMOG

Readability: Simple Measure of Gobbledygook (SMOG)
available.koRpus.lang

List available language packages
R.ld

Lexical diversity: Guiraud's R
bormuth

Readability: Bormuth's Mean Cloze and Grade Placement
read.corp.LCC

Import LCC data
textFeatures

Extract text features for authorship analysis
FORCAST

Readability: FORCAST Index
TRI

Readability: Kuntzsch's Text-Redundanz-Index
coleman

Readability: Coleman's Formulas
coleman.liau

Readability: Coleman-Liau Index
HDD

Lexical diversity: HD-D (vocd-d)
textTransform

Letter case transformation
guess.lang

Guess language a text is written in
correct.tag

Methods to correct koRpus objects
TTR

Lexical diversity: Type-Token Ratio
U.ld

Lexical diversity: Uber Index (U)
dale.chall

Readability: Dale-Chall Readability Formula
hyphen,kRp.taggedText-method

Automatic hyphenation
treetag

A function to call TreeTagger
kRp.cluster

Work in (early) progress. Probably don't even look at it. Consider it pure magic that is not to be tempered with.
harris.jacobson

Readability: Harris-Jacobson indices
tuldava

Readability: Tuldava's Text Difficulty Formula
kRp.corp.freq,-class

S4 Class kRp.corp.freq
lex.div.num

Calculate lexical diversity
kRp.TTR,-class

S4 Class kRp.TTR
kRp.analysis,-class

S4 Class kRp.analysis
linsear.write

Readability: Linsear Write Index
taggedText

Getter/setter methods for koRpus objects
plot

Plot method for objects of class kRp.tagged
query

A method to get information out of koRpus objects
install.koRpus.lang

Install language support packages
show,kRp.lang-method

Show methods for koRpus objects
koRpus-package

koRpus
lex.div

Analyze lexical diversity
spache

Readability: Spache Formula
set.lang.support

Add support for new languages
nWS

Readability: Neue Wiener Sachtextformeln
maas

Lexical diversity: Maas' indices
kRp.text.analysis

Analyze texts using TreeTagger and word frequencies
set.kRp.env

A function to set information on your koRpus environment
read.tagged

Import already tagged texts
readability

Measure readability
tokenize

A simple tokenizer
traenkle.bailer

Readability: Traenkle-Bailer Formeln
LIX

Readability: Bj\"ornsson's L\"asbarhetsindex (LIX)
MATTR

Lexical diversity: Moving-Average Type-Token Ratio (MATTR)