This class is used for objects that are returned by treetag
or tokenize
.
lang
A character string, naming the language that is assumed for the tokenized text in this object.
desc
Descriptive statistics of the tagged text.
tokens
Results of the called tokenizer and POS tagger. The data.frame usually has eleven columns:
doc_id
:Factor, optional document identifier.
token
:Character, the tokenized text.
tag
:Factor, POS tags for each token.
lemma
:Character, lemma for each token.
lttr
:Integer, number of letters.
wclass
:Factor, word class.
desc
:Factor, a short description of the POS tag.
stop
:Logical, TRUE
if token is a stopword.
stem
:Character, stemmed token.
idx
:Integer, index number of token in this document.
sntc
:Integer, number of sentence in this document.
features
A named logical vector,
indicating which features are available in this object's feat_list
slot.
Common features are listed in the description of the feat_list
slot.
feat_list
A named list with optional analysis results or other content as used by the defined features
:
hyphen
A named list of objects of class kRp.hyphen
.
readability
A named list of objects of class kRp.readability
.
lex_div
A named list of objects of class kRp.TTR
.
freq
A list with additional results of freq.analysis
.
corp_freq
An object of class kRp.corp.freq
,
e.g., results of a call to
read.corp.custom
.
diff
Additional results of calls to a method like textTransform
.
doc_term_matrix
A sparse document-term matrix,
as produced by docTermMatrix
.
getter and setter methods
for easy access to these sub-slots.
There can actually be any number of additional features,
the above is just a list of those already defined by this package.Should you need to manually generate objects of this class (which should rarely be the case),
the contructor function
kRp_text(...)
can be used instead of
new("kRp.text", ...)
.
[1] Text Interchange Formats (https://github.com/ropensci/tif)