These methods should be used to get or set values of tagged text objects
generated by koRpus functions like treetag
or tokenize
.
taggedText(obj, add.desc = FALSE, doc_id = FALSE)# S4 method for kRp.text
taggedText(obj, add.desc = FALSE, doc_id = FALSE)
taggedText(obj) <- value
# S4 method for kRp.text
taggedText(obj) <- value
doc_id(obj, ...)
# S4 method for kRp.text
doc_id(obj, has_id = NULL)
hasFeature(obj, feature = NULL, ...)
# S4 method for kRp.text
hasFeature(obj, feature = NULL)
hasFeature(obj, feature) <- value
# S4 method for kRp.text
hasFeature(obj, feature) <- value
feature(obj, feature, ...)
# S4 method for kRp.text
feature(obj, feature, doc_id = NULL)
feature(obj, feature) <- value
# S4 method for kRp.text
feature(obj, feature) <- value
corpusReadability(obj, ...)
# S4 method for kRp.text
corpusReadability(obj, doc_id = NULL)
corpusReadability(obj) <- value
# S4 method for kRp.text
corpusReadability(obj) <- value
corpusHyphen(obj, ...)
# S4 method for kRp.text
corpusHyphen(obj, doc_id = NULL)
corpusHyphen(obj) <- value
# S4 method for kRp.text
corpusHyphen(obj) <- value
corpusLexDiv(obj, ...)
# S4 method for kRp.text
corpusLexDiv(obj, doc_id = NULL)
corpusLexDiv(obj) <- value
# S4 method for kRp.text
corpusLexDiv(obj) <- value
corpusFreq(obj, ...)
# S4 method for kRp.text
corpusFreq(obj)
corpusFreq(obj) <- value
# S4 method for kRp.text
corpusFreq(obj) <- value
corpusCorpFreq(obj, ...)
# S4 method for kRp.text
corpusCorpFreq(obj)
corpusCorpFreq(obj) <- value
# S4 method for kRp.text
corpusCorpFreq(obj) <- value
corpusStopwords(obj, ...)
# S4 method for kRp.text
corpusStopwords(obj)
corpusStopwords(obj) <- value
# S4 method for kRp.text
corpusStopwords(obj) <- value
# S4 method for kRp.text,ANY,ANY,ANY
[(x, i, j, ..., drop = TRUE)
# S4 method for kRp.text,ANY,ANY,ANY
[(x, i, j, ...) <- value
# S4 method for kRp.text
[[(x, i, doc_id = NULL, ...)
# S4 method for kRp.text
[[(x, i, doc_id = NULL, ...) <- value
# S4 method for kRp.text
describe(obj, doc_id = NULL, simplify = TRUE, ...)
# S4 method for kRp.text
describe(obj, doc_id = NULL, ...) <- value
# S4 method for kRp.text
language(obj)
# S4 method for kRp.text
language(obj) <- value
diffText(obj, doc_id = NULL)
# S4 method for kRp.text
diffText(obj, doc_id = NULL)
diffText(obj) <- value
# S4 method for kRp.text
diffText(obj) <- value
originalText(obj)
# S4 method for kRp.text
originalText(obj)
is.taggedText(obj)
is.kRp.text(obj)
fixObject(obj, doc_id = NA)
# S4 method for kRp.text
fixObject(obj, doc_id = NA)
tif_as_tokens_df(tokens)
# S4 method for kRp.text
tif_as_tokens_df(tokens)
# S4 method for kRp.tagged
fixObject(obj, doc_id = NA)
# S4 method for kRp.txt.freq
fixObject(obj, doc_id = NA)
# S4 method for kRp.txt.trans
fixObject(obj, doc_id = NA)
# S4 method for kRp.analysis
fixObject(obj, doc_id = NA)
An arbitrary R
object.
Logical,
determines whether the desc
column should be re-written with descriptions
for all POS tags.
Logical (except for fixObject
, feature
, and [[/[[<-
),
if TRUE
the doc_id
column will be a factor with the respective value
of the desc
slot, i.\,e.,
the document ID will be preserved in the data.frame. If used with fixObject
, can be a character string
to set the document ID manually (the default NA
will preserve existing values and not overwrite them). If used with feature
or [[/[[<-
,
a character vector to limit the scope to one or more particular document IDs.
The new value to replace the current with.
Additional arguments for the generics.
A character vector with doc_id
s to look for in the object. The return value
is then a logical vector of the same length,
indicating if the respective id was found or not.
Character string naming the feature to look for. The return value is logical if a single feature
name is given. If feature=NULL
, a character vector is returned,
naming all features found in the object.
An object of class kRp.text
or kRp.hyphen
.
Defines the row selector ([
) or the name to match ([[
).
Defines the column selector.
Logical,
whether the result should be coerced to the lowest possible dimension. See [
for more details.
Logical, if TRUE
and the result is a list oft length one (i.e.,
just a single doc_id
),
returns the contents of the single list entry.
An object of class kRp.text
.
taggedText()
returns the tokens
slot.
doc_id()
Returns a character vector of all doc_id
values in the object.
describe()
returns the desc
slot.
language()
returns the lang
slot.
[
/[[
Can be used as a shortcut to index the results of taggedText()
.
fixObject
returns the same object upgraded to the object structure of this package version (e.g.,
new columns, changed names, etc.).
hasFeature()
returns TRUE
or codeFALSE,
depending on whether the requested feature is present or not.
feature()
returns the list entry of the feat_list
slot for the requested feature.
corpusReadability()
returns the list of kRp.readability
objects,
see readability
.
corpusHyphen()
returns the list of kRp.hyphen
objects,
see hyphen
.
corpusLexDiv()
returns the list of kRp.TTR
objects,
see lex.div
.
corpusFreq()
returns the frequency analysis data from the feat_list
slot,
see freq.analysis
.
corpusCorpFreq()
returns the kRp.corp.freq
object of the feat_list
slot,
see for example read.corp.custom
.
corpusStopwords()
returns the number of stopwords found in each text (if analyzed) from the feat_list
slot.
tif_as_tokens_df
returns the tokens
slot in a TIF[1] compliant format,
i.e., doc_id
is not a factor but a character vector.
originalText()
similar to taggedText()
,
but reverts any transformations back to the original text before returning the tokens
slot.
Only works if the object has the feature diff
, see examples.
diffText()
returns the diff
slot, if present.
[1] Text Interchange Formats (https://github.com/ropensci/tif)
# NOT RUN {
# code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
sample_file <- file.path(
path.package("koRpus"), "examples", "corpus", "Reality_Winner.txt"
)
tokenized.obj <- tokenize(
txt=sample_file,
lang="en"
)
doc_id(tokenized.obj)
describe(tokenized.obj)
language(tokenized.obj)
taggedText(tokenized.obj)
tokenized.obj[["token"]]
tokenized.obj[1:3, "token"]
tif_as_tokens_df(tokenized.obj)
# example for originalText()
tokenized.obj <- jumbleWords(tokenized.obj)
# now compare the jumbled words to the original
tokenized.obj[["token"]]
originalText(tokenized.obj)[["token"]]
} else {}
# }
Run the code above in your browser using DataLab