The function freq.analysis
analyzes texts regarding frequencies of tokens,
word classes etc.
freq.analysis(txt.file, ...)# S4 method for kRp.text
freq.analysis(
txt.file,
corp.freq = NULL,
desc.stat = TRUE,
corp.rm.class = "nonpunct",
corp.rm.tag = c()
)
An object of class kRp.text
.
Additional options for the generic.
An object of class kRp.corp.freq
.
Logical, whether an updated descriptive statistical analysis should be conducted.
A character vector with word classes which should be ignored for frequency analysis. The default value
"nonpunct"
has special meaning and will cause the result of
kRp.POS.tags(lang, tags=c("punct","sentc"), list.classes=TRUE)
to be used.
A character vector with POS tags which should be ignored for frequency analysis.
An updated object of class kRp.text
with the added feature freq
,
which is a list with information on the word frequencies of the analyzed text.
Use corpusFreq
to get that slot.
It adds new columns with frequency information to the tokens
data frame of the input data,
describing how often the particular token is used in the additionally provided corpus frequency object.
To get the results, you can use taggedText
to get the tokens
slot,
describe
to get
the raw descriptive statistics (only updated if desc.stat=TRUE
),
and corpusFreq
to get
the data from the added freq
feature.
If corp.freq
provides appropriate idf values for the types in txt.file
, the
term frequency--inverse document frequency statistic (tf-idf) will also be computed.
Missing idf values will result in NA
.
# NOT RUN {
freq.analysis(tagged.text, corp.freq=my.LCC.data)
# }
Run the code above in your browser using DataLab