koRpus (version 0.13-8)

taggedText: Getter/setter methods for koRpus objects


These methods should be used to get or set values of tagged text objects generated by koRpus functions like treetag or tokenize.


taggedText(obj, add.desc = FALSE, doc_id = FALSE)

# S4 method for kRp.text taggedText(obj, add.desc = FALSE, doc_id = FALSE)

taggedText(obj) <- value

# S4 method for kRp.text taggedText(obj) <- value

doc_id(obj, ...)

# S4 method for kRp.text doc_id(obj, has_id = NULL)

hasFeature(obj, feature = NULL, ...)

# S4 method for kRp.text hasFeature(obj, feature = NULL)

hasFeature(obj, feature) <- value

# S4 method for kRp.text hasFeature(obj, feature) <- value

feature(obj, feature, ...)

# S4 method for kRp.text feature(obj, feature, doc_id = NULL)

feature(obj, feature) <- value

# S4 method for kRp.text feature(obj, feature) <- value

corpusReadability(obj, ...)

# S4 method for kRp.text corpusReadability(obj, doc_id = NULL)

corpusReadability(obj) <- value

# S4 method for kRp.text corpusReadability(obj) <- value

corpusHyphen(obj, ...)

# S4 method for kRp.text corpusHyphen(obj, doc_id = NULL)

corpusHyphen(obj) <- value

# S4 method for kRp.text corpusHyphen(obj) <- value

corpusLexDiv(obj, ...)

# S4 method for kRp.text corpusLexDiv(obj, doc_id = NULL)

corpusLexDiv(obj) <- value

# S4 method for kRp.text corpusLexDiv(obj) <- value

corpusFreq(obj, ...)

# S4 method for kRp.text corpusFreq(obj)

corpusFreq(obj) <- value

# S4 method for kRp.text corpusFreq(obj) <- value

corpusCorpFreq(obj, ...)

# S4 method for kRp.text corpusCorpFreq(obj)

corpusCorpFreq(obj) <- value

# S4 method for kRp.text corpusCorpFreq(obj) <- value

corpusStopwords(obj, ...)

# S4 method for kRp.text corpusStopwords(obj)

corpusStopwords(obj) <- value

# S4 method for kRp.text corpusStopwords(obj) <- value

# S4 method for kRp.text,ANY,ANY,ANY [(x, i, j, ..., drop = TRUE)

# S4 method for kRp.text,ANY,ANY,ANY [(x, i, j, ...) <- value

# S4 method for kRp.text [[(x, i, doc_id = NULL, ...)

# S4 method for kRp.text [[(x, i, doc_id = NULL, ...) <- value

# S4 method for kRp.text describe(obj, doc_id = NULL, simplify = TRUE, ...)

# S4 method for kRp.text describe(obj, doc_id = NULL, ...) <- value

# S4 method for kRp.text language(obj)

# S4 method for kRp.text language(obj) <- value

diffText(obj, doc_id = NULL)

# S4 method for kRp.text diffText(obj, doc_id = NULL)

diffText(obj) <- value

# S4 method for kRp.text diffText(obj) <- value


# S4 method for kRp.text originalText(obj)



fixObject(obj, doc_id = NA)

# S4 method for kRp.text fixObject(obj, doc_id = NA)


# S4 method for kRp.text tif_as_tokens_df(tokens)

# S4 method for kRp.tagged fixObject(obj, doc_id = NA)

# S4 method for kRp.txt.freq fixObject(obj, doc_id = NA)

# S4 method for kRp.txt.trans fixObject(obj, doc_id = NA)

# S4 method for kRp.analysis fixObject(obj, doc_id = NA)



An arbitrary R object.


Logical, determines whether the desc column should be re-written with descriptions for all POS tags.


Logical (except for fixObject, feature, and [[/[[<-), if TRUE the doc_id column will be a factor with the respective value of the desc slot, i.\,e., the document ID will be preserved in the data.frame. If used with fixObject, can be a character string to set the document ID manually (the default NA will preserve existing values and not overwrite them). If used with feature or [[/[[<-, a character vector to limit the scope to one or more particular document IDs.


The new value to replace the current with.


Additional arguments for the generics.


A character vector with doc_ids to look for in the object. The return value is then a logical vector of the same length, indicating if the respective id was found or not.


Character string naming the feature to look for. The return value is logical if a single feature name is given. If feature=NULL, a character vector is returned, naming all features found in the object.


An object of class kRp.text or kRp.hyphen.


Defines the row selector ([) or the name to match ([[).


Defines the column selector.


Logical, whether the result should be coerced to the lowest possible dimension. See [ for more details.


Logical, if TRUE and the result is a list oft length one (i.e., just a single doc_id), returns the contents of the single list entry.


An object of class kRp.text.


  • taggedText() returns the tokens slot.

  • doc_id() Returns a character vector of all doc_id values in the object.

  • describe() returns the desc slot.

  • language() returns the lang slot.

  • [/[[ Can be used as a shortcut to index the results of taggedText().

  • fixObject returns the same object upgraded to the object structure of this package version (e.g., new columns, changed names, etc.).

  • hasFeature() returns TRUE or codeFALSE, depending on whether the requested feature is present or not.

  • feature() returns the list entry of the feat_list slot for the requested feature.

  • corpusReadability() returns the list of kRp.readability objects, see readability.

  • corpusHyphen() returns the list of kRp.hyphen objects, see hyphen.

  • corpusLexDiv() returns the list of kRp.TTR objects, see lex.div.

  • corpusFreq() returns the frequency analysis data from the feat_list slot, see freq.analysis.

  • corpusCorpFreq() returns the kRp.corp.freq object of the feat_list slot, see for example read.corp.custom.

  • corpusStopwords() returns the number of stopwords found in each text (if analyzed) from the feat_list slot.

  • tif_as_tokens_df returns the tokens slot in a TIF[1] compliant format, i.e., doc_id is not a factor but a character vector.

  • originalText() similar to taggedText(), but reverts any transformations back to the original text before returning the tokens slot. Only works if the object has the feature diff, see examples.

  • diffText() returns the diff slot, if present.


[1] Text Interchange Formats (https://github.com/ropensci/tif)


Run this code
# code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
  sample_file <- file.path(
    path.package("koRpus"), "examples", "corpus", "Reality_Winner.txt"
  tokenized.obj <- tokenize(




  tokenized.obj[1:3, "token"]


  # example for originalText()
  tokenized.obj <- jumbleWords(tokenized.obj)
  # now compare the jumbled words to the original
} else {}
# }

