The function freq.analysis analyzes texts regarding frequencies of tokens,
word classes etc.
freq.analysis(txt.file, ...)# S4 method for kRp.taggedText
freq.analysis(txt.file, corp.freq = NULL,
desc.stat = TRUE, force.lang = NULL, tagger = "kRp.env",
corp.rm.class = "nonpunct", corp.rm.tag = c(), tfidf = TRUE, ...)
# S4 method for character
freq.analysis(txt.file, corp.freq = NULL,
desc.stat = TRUE, force.lang = NULL, tagger = "kRp.env",
corp.rm.class = "nonpunct", corp.rm.tag = c(), tfidf = TRUE, ...)
Either an object of class kRp.tagged,
kRp.txt.freq,
kRp.analysis or kRp.txt.trans,
or a character vector which must
be a valid path to a file containing the text to be analyzed.
Additional options to be passed through to the function defined with tagger.
An object of class kRp.corp.freq.
Logical, whether a descriptive statistical analysis should be performed.
A character string defining the language to be assumed for the text, by force.
A character string defining the tokenizer/tagger command you want to use for basic text analysis. Can be omitted if
txt.file is already of class kRp.tagged-class. Defaults to "kRp.env" to get the settings by
get.kRp.env. Set to "tokenize" to use tokenize.
A character vector with word classes which should be ignored for frequency analysis. The default value
"nonpunct" has special meaning and will cause the result of
kRp.POS.tags(lang, c("punct","sentc"), list.classes=TRUE) to be used.
A character vector with POS tags which should be ignored for frequency analysis.
Logical,
whether the term frequency--inverse document frequency statistic (tf-idf) should be computed. Requires
corp.freq to provide appropriate idf values for the types in txt.file. Missing idf values will result in NA.
An object of class kRp.txt.freq.
The easiest way to see what kinds of analyses are done is probably to look at the slot description of kRp.txt.freq.
By default, if the text has yet to be tagged,
the language definition is queried by calling get.kRp.env(lang=TRUE) internally.
Or, if txt.file has already been tagged,
by default the language definition of that tagged object is read
and used. Set force.lang=get.kRp.env(lang=TRUE) or to any other valid value,
if you want to forcibly overwrite this
default behaviour,
and only then. See kRp.POS.tags for all supported languages.
# NOT RUN {
freq.analysis("~/some/text.txt", corp.freq=my.LCC.data)
# }
Run the code above in your browser using DataLab