kRp.corp.freq,-class: S4 Class kRp.corp.freq
Slots
meta
- Metadata on the corpora (dee details).
words
- Absolute word frequencies. It has at least the following columns:
num
:- Some word ID from the DB, integer
word
:- The word itself
lemma
:- The lemma of the word
tag
:- A part-of-speech tag
wclass
:- The word class
lttr
:- The number of characters
freq
:- The frequency of that word in the corpus DB
pct
:- Percentage of appearance in DB
pmio
:- Appearance per million words in DB
log10
:- Base 10 logarithm of word frequency
rank.avg
:- Rank in corpus data,
rank
ties method "average" rank.min
:- Rank in corpus data,
rank
ties method "min" rank.rel.avg
:- Relative rank, i.e. percentile of
"rank.avg"
rank.rel.min
:- Relative rank, i.e. percentile of
"rank.min"
inDocs
:- The absolute number of documents in the corpus containing the word
idf
:- The inverse document frequency
The slot might have additional columns, depending on the input material. desc
- Descriptive information. It contains six numbers from the
meta
information,
for convenient accessibility:
tokens
:- Number of running word forms
types
:- Number of distinct word forms
words.p.sntc
:- Average sentence length in words
chars.p.sntc
:- Average sentence length in characters
chars.p.wform
:- Average word form length
chars.p.word
:- Average running word length
The slot might have additional columns, depending on the input material.
Details
The slot meta
simply contains all information from the "meta.txt" of the LCC[1] data and remains empty for data from a Celex[2] DB.