Learn R Programming

koRpus (version 0.13-8)

koRpus-deprecated: Deprecated object classes

Description

These classes are no longer used by the koRpus package and will be removed in a later version. They are kept here for the time being so you can still load old objects and convert them into new objects using the fixObject method.

These functions will be removed soon and should no longer ne used.

Usage

kRp.filter.wclass(...)

kRp.text.paste(...)

read.tagged(...)

kRp.text.transform(...)

Arguments

...

Parameters to be passed to the replacement of the function

Slots

lang

A character string, naming the language that is assumed for the tokenized text in this object.

desc

Descriptive statistics of the tagged text.

TT.res

Results of the called tokenizer and POS tagger. The data.frame usually has eleven columns:

doc_id:

Factor, optional document identifier.

token:

Character, the tokenized text.

tag:

Factor, POS tags for each token.

lemma:

Character, lemma for each token.

lttr:

Integer, number of letters.

wclass:

Factor, word class.

desc:

Factor, a short description of the POS tag.

stop:

Logical, TRUE if token is a stopword.

stem:

Character, stemmed token.

idx:

Integer, index number of token in this document.

sntc:

Integer, number of sentence in this document.

This data.frame structure adheres to the "Text Interchange Formats" guidelines set out by rOpenSci[1].

freq.analysis

A list with information on the word frequencies of the analyzed text.

diff

A list with mostly atomic vectors, describing the amount of diffences between both text variants (percentage):

all.tokens:

Percentage of all tokens, including punctuation, that were altered.

words:

Percentage of altered words only.

all.chars:

Percentage of all characters, including punctuation, that were altered.

letters:

Percentage of altered letters in words only.

transfmt:

Character vector documenting the transformation(s) done to the tokens.

transfmt.equal:

Data frame documenting which token was changed in which transformational step. Only available if more than one transformation was done.

transfmt.normalize:

A list documenting steps of normalization that were done to the object, one element per transformation. Each entry holds the name of the method, the query parameters, and the effective replacement value.

lex.div

Information on lexical diversity

S4 Class <code>kRp.tagged</code>

This was used for objects returned by treetag or tokenize. It was replaced by kRp.text.

S4 Class <code>kRp.txt.freq</code>

This was used for objects returned by freq.analysis. It was replaced by kRp.text.

S4 Class <code>kRp.txt.trans</code>

This was used for objects returned by textTransform, clozeDelete, cTest, and jumbleWords. It was replaced by kRp.text.

S4 Class <code>kRp.analysis</code>

This was used for objects returned by kRp.text.analysis. The function is also deprecated, functionality can be replicated by combining treetag,freq.analysis and lex.div.

References

[1] Text Interchange Formats (https://github.com/ropensci/tif)