kRp.text.analysis

Either an object of class <code><a rd-options="koRpus" href="/link/kRp.tagged-class?package=koRpus&version=0.10-2&to=koRpus" data-mini-rdoc="koRpus::kRp.tagged-class">kRp.tagged-class</a></code>,
 <code><a rd-options="koRpus" href="/link/kRp.txt.freq-class?package=koRpus&version=0.10-2&to=koRpus" data-mini-rdoc="koRpus::kRp.txt.freq-class">kRp.txt.freq-class</a></code>,
<code><a rd-options="koRpus" href="/link/kRp.analysis-class?package=koRpus&version=0.10-2&to=koRpus" data-mini-rdoc="koRpus::kRp.analysis-class">kRp.analysis-class</a></code> or <code><a rd-options="koRpus" href="/link/kRp.txt.trans-class?package=koRpus&version=0.10-2&to=koRpus" data-mini-rdoc="koRpus::kRp.txt.trans-class">kRp.txt.trans-class</a></code>, or
a character vector which must be be a valid path to a file containing the text to be analyzed.

txt.file

A character string defining the tokenizer/tagger command you want to use for basic text analysis. Can be omitted if
<code>txt.file</code> is already of class <code>kRp.tagged-class</code>. Defaults to <code>"kRp.env"</code> to get the settings by
<code><a rd-options="koRpus:get.kRp.env" href="/link/get.kRp.env?package=koRpus&version=0.10-2&to=koRpus%3Aget.kRp.env" data-mini-rdoc="koRpus:get.kRp.env::get.kRp.env">get.kRp.env</a></code>. Set to <code>"tokenize"</code> to use <code><a rd-options="koRpus:tokenize" href="/link/tokenize?package=koRpus&version=0.10-2&to=koRpus%3Atokenize" data-mini-rdoc="koRpus:tokenize::tokenize">tokenize</a></code>.

tagger

A character string defining the language to be assumed for the text,
 by force.

force.lang

Logical, whether a descriptive statistical analysis should be performed.

desc.stat

Logical, whether some lexical diversity analysis should be performed,
 using <code><a rd-options="koRpus:lex.div" href="/link/lex.div?package=koRpus&version=0.10-2&to=koRpus%3Alex.div" data-mini-rdoc="koRpus:lex.div::lex.div">lex.div</a></code>.

lex.div

An object of class <code><a rd-options="koRpus" href="/link/kRp.corp.freq-class?package=koRpus&version=0.10-2&to=koRpus" data-mini-rdoc="koRpus::kRp.corp.freq-class">kRp.corp.freq-class</a></code>. If present,
 a frequency index for the analyzed text is computed (see details).

corp.freq

A character vector with word classes which should be ignored for frequency analysis. The default value
<code>"nonpunct"</code> has special meaning and will cause the result of
<code>kRp.POS.tags(lang, c("punct","sentc"), list.classes=TRUE)</code> to be used.

corp.rm.class

A character vector with POS tags which should be ignored for frequency analysis.

corp.rm.tag

Additional options to be passed through to the function defined with <code>tagger</code>.

The function <code>kRp.text.analysis</code> analyzes texts in various ways.

misc

A set of tools to analyze texts. Includes, amongst others, functions for automatic language detection, hyphenation,
several indices of lexical diversity (e.g., type token ratio, HD-D/vocd-D, MTLD) and readability (e.g., Flesch, SMOG,
LIX, Dale-Chall). Basic import functions for language corpora are also provided, to enable frequency analyses (supports
Celex and Leipzig Corpora Collection file formats) and measures like tf-idf. Support for additional languages can be
added on-the-fly or by plugin packages. Note: For full functionality a local installation of TreeTagger is recommended.
'koRpus' also includes a plugin for the R GUI and IDE RKWard, providing graphical dialogs for its basic features. The
respective R package 'rkward' cannot be installed directly from a repository, as it is a part of RKWard. To make full
use of this feature, please install RKWard from <https://rkward.kde.org> (plugins are detected automatically). Due to
some restrictions on CRAN, the full package sources are only available from the project homepage. To ask for help,
report bugs, request features, or discuss the development of the package, please subscribe to the koRpus-dev mailing
list (<http://korpusml.reaktanz.de>).

Meik Michalke

koRpus

An R Package for Text Analysis

m.eik michalke

Earl Brown

Alberto Mirisola

Alexandre Brulet

Laura Hauser

kRp.text.analysis function

Either an object of class <code><a rd-options='koRpus' href='kRp.tagged-class'>kRp.tagged-class</a></code>,
 <code><a rd-options='koRpus' href='kRp.txt.freq-class'>kRp.txt.freq-class</a></code>,
<code><a rd-options='koRpus' href='kRp.analysis-class'>kRp.analysis-class</a></code> or <code><a rd-options='koRpus' href='kRp.txt.trans-class'>kRp.txt.trans-class</a></code>, or
a character vector which must be be a valid path to a file containing the text to be analyzed.

A character string defining the tokenizer/tagger command you want to use for basic text analysis. Can be omitted if
<code>txt.file</code> is already of class <code>kRp.tagged-class</code>. Defaults to <code>"kRp.env"</code> to get the settings by
<code><a rd-options='koRpus:get.kRp.env' href='get.kRp.env'>get.kRp.env</a></code>. Set to <code>"tokenize"</code> to use <code><a rd-options='koRpus:tokenize' href='tokenize'>tokenize</a></code>.

Logical, whether some lexical diversity analysis should be performed,
 using <code><a rd-options='koRpus:lex.div' href='lex.div'>lex.div</a></code>.

An object of class <code><a rd-options='koRpus' href='kRp.corp.freq-class'>kRp.corp.freq-class</a></code>. If present,
 a frequency index for the analyzed text is computed (see details).

kRp.text.analysis: Analyze texts using TreeTagger and word frequencies

Description

Usage

Arguments

Value

Details

References

See Also

Examples