These classes are no longer used by the koRpus
package and will be removed in a later version.
They are kept here for the time being so you can still load old objects and convert them into new objects using the
fixObject
method.
These functions will be removed soon and should no longer ne used.
kRp.filter.wclass(...)kRp.text.paste(...)
read.tagged(...)
kRp.text.transform(...)
Parameters to be passed to the replacement of the function
lang
A character string, naming the language that is assumed for the tokenized text in this object.
desc
Descriptive statistics of the tagged text.
TT.res
Results of the called tokenizer and POS tagger. The data.frame usually has eleven columns:
doc_id
:Factor, optional document identifier.
token
:Character, the tokenized text.
tag
:Factor, POS tags for each token.
lemma
:Character, lemma for each token.
lttr
:Integer, number of letters.
wclass
:Factor, word class.
desc
:Factor, a short description of the POS tag.
stop
:Logical, TRUE
if token is a stopword.
stem
:Character, stemmed token.
idx
:Integer, index number of token in this document.
sntc
:Integer, number of sentence in this document.
freq.analysis
A list with information on the word frequencies of the analyzed text.
diff
A list with mostly atomic vectors, describing the amount of diffences between both text variants (percentage):
all.tokens
:Percentage of all tokens, including punctuation, that were altered.
words
:Percentage of altered words only.
all.chars
:Percentage of all characters, including punctuation, that were altered.
letters
:Percentage of altered letters in words only.
transfmt
:Character vector documenting the transformation(s) done to the tokens.
transfmt.equal
:Data frame documenting which token was changed in which transformational step. Only available if more than one transformation was done.
transfmt.normalize
:A list documenting steps of normalization that were done to the object, one element per transformation. Each entry holds the name of the method, the query parameters, and the effective replacement value.
lex.div
Information on lexical diversity
This was used for objects returned by treetag
or tokenize
.
It was replaced by kRp.text
.
This was used for objects returned by freq.analysis
.
It was replaced by kRp.text
.
This was used for objects returned by textTransform
,
clozeDelete
,
cTest
, and jumbleWords
.
It was replaced by kRp.text
.
This was used for objects returned by kRp.text.analysis
.
The function is also deprecated,
functionality can be replicated by combining treetag
,freq.analysis
and lex.div
.
[1] Text Interchange Formats (https://github.com/ropensci/tif)