types

tokens

types,kRp.TTR-method

tokens,kRp.TTR-method

types,kRp.text-method

tokens,kRp.text-method

types,character-method

tokens,character-method

An object of either class <code><a rd-options="koRpus:kRp.text-class" href="/link/kRp.text?package=koRpus&version=0.13-4&to=koRpus%3AkRp.text-class" data-mini-rdoc="koRpus:kRp.text-class::kRp.text">kRp.text</a></code> or
<code><a rd-options="koRpus:kRp.TTR-class" href="/link/kRp.TTR?package=koRpus&version=0.13-4&to=koRpus%3AkRp.TTR-class" data-mini-rdoc="koRpus:kRp.TTR-class::kRp.TTR">kRp.TTR</a></code>, or a character vector.

Logical,
 whether statistics on the length in characters and frequency of types in the text should also be returned.

stats

Logical, whether types should be counted case sensitive.
This option is available for tagged text and character input only.

case.sens

Logical,
 whether analysis should be carried out on the lemmatized tokens rather than all running word forms.
This option is available for tagged text and character input only.

lemmatize

A character vector with word classes which should be dropped. The default value
<code>"nonpunct"</code> has special meaning and will cause the result of
<code>kRp.POS.tags(lang, tags=c("punct","sentc"), list.classes=TRUE)</code> to be used.
This option is available for tagged text and character input only.

corp.rm.class

A character vector with POS tags which should be dropped.
This option is available for tagged text and character input only.

corp.rm.tag

Set the language of a text,
 see the <code>force.lang</code> option of <code><a rd-options="koRpus:lex.div" href="/link/lex.div?package=koRpus&version=0.13-4&to=koRpus%3Alex.div" data-mini-rdoc="koRpus:lex.div::lex.div">lex.div</a></code>.
This option is available for character input only.

lang

These methods return character vectors that return all types or tokens of a given text,
 where text can either be a character
vector itself, a previosly tokenized/tagged koRpus object,
 or an object of class <code>kRp.TTR</code>.

A set of tools to analyze texts. Includes, amongst others, functions for
automatic language detection, hyphenation, several indices of lexical diversity
(e.g., type token ratio, HD-D/vocd-D, MTLD) and readability (e.g., Flesch,
SMOG, LIX, Dale-Chall). Basic import functions for language corpora are also
provided, to enable frequency analyses (supports Celex and Leipzig Corpora
Collection file formats) and measures like tf-idf. Note: For full functionality
a local installation of TreeTagger is recommended. It is also recommended to
not load this package directly, but by loading one of the available language
support packages from the 'l10n' repository
<https://undocumeantit.github.io/repos/l10n/>. 'koRpus' also includes a plugin
for the R GUI and IDE RKWard, providing graphical dialogs for its basic
features. The respective R package 'rkward' cannot be installed directly from a
repository, as it is a part of RKWard. To make full use of this feature, please
install RKWard from <https://rkward.kde.org> (plugins are detected
automatically). Due to some restrictions on CRAN, the full package sources are
only available from the project homepage. To ask for help, report bugs, request
features, or discuss the development of the package, please subscribe to the
koRpus-dev mailing list (<https://korpusml.reaktanz.de>).

Meik Michalke

koRpus

Text Analysis with Emphasis on POS Tagging, Readability and
Lexical Diversity

Earl Brown

Alberto Mirisola

Alexandre Brulet

Laura Hauser

types function

An object of either class <code><a rd-options='koRpus:kRp.text-class' href='kRp.text'>kRp.text</a></code> or
<code><a rd-options='koRpus:kRp.TTR-class' href='kRp.TTR'>kRp.TTR</a></code>, or a character vector.

Set the language of a text,
 see the <code>force.lang</code> option of <code><a rd-options='koRpus:lex.div' href='lex.div'>lex.div</a></code>.
This option is available for character input only.

types: Get types and tokens of a given text

Description

Usage

Arguments

Value

See Also

Examples