The function freq.analysis
analyzes texts regarding frequencies of tokens,
word classes etc.
freq.analysis(txt.file, ...)# S4 method for kRp.taggedText
freq.analysis(txt.file, corp.freq = NULL,
desc.stat = TRUE, force.lang = NULL, tagger = "kRp.env",
corp.rm.class = "nonpunct", corp.rm.tag = c(), tfidf = TRUE, ...)
# S4 method for character
freq.analysis(txt.file, corp.freq = NULL,
desc.stat = TRUE, force.lang = NULL, tagger = "kRp.env",
corp.rm.class = "nonpunct", corp.rm.tag = c(), tfidf = TRUE, ...)
Either an object of class kRp.tagged-class
,
kRp.txt.freq-class
,
kRp.analysis-class
or kRp.txt.trans-class
,
or a character vector which must
be a valid path to a file containing the text to be analyzed.
Additional options to be passed through to the function defined with tagger
.
An object of class kRp.corp.freq-class
.
Logical, whether a descriptive statistical analysis should be performed.
A character string defining the language to be assumed for the text, by force.
A character string defining the tokenizer/tagger command you want to use for basic text analysis. Can be omitted if
txt.file
is already of class kRp.tagged-class
. Defaults to "kRp.env"
to get the settings by
get.kRp.env
. Set to "tokenize"
to use tokenize
.
A character vector with word classes which should be ignored for frequency analysis. The default value
"nonpunct"
has special meaning and will cause the result of
kRp.POS.tags(lang, c("punct","sentc"), list.classes=TRUE)
to be used.
A character vector with POS tags which should be ignored for frequency analysis.
Logical,
whether the term frequency--inverse document frequency statistic (tf-idf) should be computed. Requires
corp.freq
to provide appropriate idf values for the types in txt.file
. Missing idf values will result in NA
.
An object of class kRp.txt.freq-class
.
The easiest way to see what kinds of analyses are done is probably to look at the slot description of kRp.txt.freq-class
.
By default, if the text has yet to be tagged,
the language definition is queried by calling get.kRp.env(lang=TRUE)
internally.
Or, if txt.file
has already been tagged,
by default the language definition of that tagged object is read
and used. Set force.lang=get.kRp.env(lang=TRUE)
or to any other valid value,
if you want to forcibly overwrite this
default behaviour,
and only then. See kRp.POS.tags
for all supported languages.
# NOT RUN {
freq.analysis("~/some/text.txt", corp.freq=my.LCC.data)
# }
Run the code above in your browser using DataLab