Learn R Programming

koRpus (version 0.10-2)

freq.analysis: Analyze word frequencies

Description

The function freq.analysis analyzes texts regarding frequencies of tokens, word classes etc.

Usage

freq.analysis(txt.file, ...)

# S4 method for kRp.taggedText freq.analysis(txt.file, corp.freq = NULL, desc.stat = TRUE, force.lang = NULL, tagger = "kRp.env", corp.rm.class = "nonpunct", corp.rm.tag = c(), tfidf = TRUE, ...)

# S4 method for character freq.analysis(txt.file, corp.freq = NULL, desc.stat = TRUE, force.lang = NULL, tagger = "kRp.env", corp.rm.class = "nonpunct", corp.rm.tag = c(), tfidf = TRUE, ...)

Arguments

txt.file

Either an object of class kRp.tagged-class, kRp.txt.freq-class, kRp.analysis-class or kRp.txt.trans-class, or a character vector which must be a valid path to a file containing the text to be analyzed.

...

Additional options to be passed through to the function defined with tagger.

corp.freq

An object of class kRp.corp.freq-class.

desc.stat

Logical, whether a descriptive statistical analysis should be performed.

force.lang

A character string defining the language to be assumed for the text, by force.

tagger

A character string defining the tokenizer/tagger command you want to use for basic text analysis. Can be omitted if txt.file is already of class kRp.tagged-class. Defaults to "kRp.env" to get the settings by get.kRp.env. Set to "tokenize" to use tokenize.

corp.rm.class

A character vector with word classes which should be ignored for frequency analysis. The default value "nonpunct" has special meaning and will cause the result of kRp.POS.tags(lang, c("punct","sentc"), list.classes=TRUE) to be used.

corp.rm.tag

A character vector with POS tags which should be ignored for frequency analysis.

tfidf

Logical, whether the term frequency--inverse document frequency statistic (tf-idf) should be computed. Requires corp.freq to provide appropriate idf values for the types in txt.file. Missing idf values will result in NA.

Value

An object of class kRp.txt.freq-class.

Details

The easiest way to see what kinds of analyses are done is probably to look at the slot description of kRp.txt.freq-class.

By default, if the text has yet to be tagged, the language definition is queried by calling get.kRp.env(lang=TRUE) internally. Or, if txt.file has already been tagged, by default the language definition of that tagged object is read and used. Set force.lang=get.kRp.env(lang=TRUE) or to any other valid value, if you want to forcibly overwrite this default behaviour, and only then. See kRp.POS.tags for all supported languages.

See Also

get.kRp.env, kRp.tagged-class, kRp.corp.freq-class

Examples

Run this code
# NOT RUN {
freq.analysis("~/some/text.txt", corp.freq=my.LCC.data)
# }

Run the code above in your browser using DataLab