kRp.corp.freq,-class: S4 Class kRp.corp.freq

Description

This class is used for objects that are returned by read.corp.LCC and read.corp.celex.

words

Absolute word frequencies. It has at least the following columns:

num:: Some word ID from the DB, integer
word:: The word itself
lemma:: The lemma of the word
tag:: A part-of-speech tag
wclass:: The word class
lttr:: The number of characters
freq:: The frequency of that word in the corpus DB
pct:: Percentage of appearance in DB
pmio:: Appearance per million words in DB
log10:: Base 10 logarithm of word frequency
rank.avg:: Rank in corpus data, rank ties method "average"
rank.min:: Rank in corpus data, rank ties method "min"
rank.rel.avg:: Relative rank, i.e. percentile of "rank.avg"
rank.rel.min:: Relative rank, i.e. percentile of "rank.min"
inDocs:: The absolute number of documents in the corpus containing the word
idf:: The inverse document frequency

The slot might have additional columns, depending on the input material.

desc

Descriptive information. It contains six numbers from the meta information, for convenient accessibility:

The slot might have additional columns, depending on the input material.

bigrams

A data.frame listing all tokens that co-occurred next to each other in the corpus:

cooccur

Similar to bigrams, but listing co-occurrences anywhere in one sentence:

The slot meta simply contains all information from the "meta.txt" of the LCC[1] data and remains empty for data from a Celex[2] DB.