Learn R Programming

koRpus (version 0.04-40)

kRp.text.analysis: Analyze texts using TreeTagger and word frequencies

Description

The function kRp.text.analysis analyzes texts in various ways.

Usage

kRp.text.analysis(txt.file, tagger = "kRp.env",
    force.lang = NULL, desc.stat = TRUE, lex.div = TRUE,
    corp.freq = NULL, corp.rm.class = "nonpunct",
    corp.rm.tag = c(), ...)

Arguments

txt.file
Either an object of class kRp.tagged-class, kRp.txt.freq-class,
tagger
A character string defining the tokenizer/tagger command you want to use for basic text analysis. Can be omitted if txt.file is already of class kRp.tagged-class. Defaults to "kRp.env" to get the settings by
force.lang
A character string defining the language to be assumed for the text, by force.
desc.stat
Logical, whether a descriptive statistical analysis should be performed.
lex.div
Logical, whether some lexical diversity analysis should be performed, using lex.div.
corp.freq
An object of class kRp.corp.freq-class. If present, a frequency index for the analyzed text is computed (see details).
corp.rm.class
A character vector with word classes which should be ignored for frequency analysis. The default value "nonpunct" has special meaning and will cause the result of kRp.POS.tags(lang, c("punct","sentc"), list.classes=TRUE)
corp.rm.tag
A character vector with POS tags which should be ignored for frequency analysis.
...
Additional options to be passed through to the function defined with tagger.

Value

Details

The function is basically a wrapper for treetag(),freq.analysis() and lex.div().

By default, if the text has to be tagged yet, the language definition is queried by calling get.kRp.env(lang=TRUE) internally. Or, if txt.file has already been tagged, by default the language definition of that tagged object is read and used. Set force.lang=get.kRp.env(lang=TRUE) or to any other valid value, if you want to forcibly overwrite this default behaviour, and only then. See kRp.POS.tags for all supported languages.

References

[1] http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html

See Also

set.kRp.env, get.kRp.env, kRp.POS.tags, lex.div

Examples

Run this code
kRp.text.analysis("/some/text.txt")

Run the code above in your browser using DataLab