This class is used for objects that are returned by treetag
or tokenize
.
lang
A character string, naming the language that is assumed for the tokenized text in this object.
desc
Descriptive statistics of the tagged text.
TT.res
Results of the called tokenizer and POS tagger. The data.frame has eight columns:
doc_id
:Optional document identifier.
token
:The tokenized text.
tag
:POS tags for each token.
lemma
:Lemma for each token.
lttr
:Number of letters.
wclass
:Word class.
desc
:A short description of the POS tag.
stop
:Logical, TRUE
if token is a stopword.
stem
:Stemmed token.
idx
:Index number of token in this document.
sntc
:Number of sentence in this document.
Should you need to manually generate objects of this class (which should rarely be the case),
the contructor function
kRp_tagged(...)
can be used instead of
new("kRp.tagged", ...)
.
[1] Text Interchange Formats (https://github.com/ropensci/tif)