These methods return character vectors that return all types or tokens of a given text,
where text can either be a character
vector itself, a previosly tokenized/tagged koRpus object,
or an object of class kRp.TTR
.
types(txt, ...)tokens(txt, ...)
# S4 method for kRp.TTR
types(txt, stats = FALSE)
# S4 method for kRp.TTR
tokens(txt)
# S4 method for kRp.text
types(
txt,
case.sens = FALSE,
lemmatize = FALSE,
corp.rm.class = "nonpunct",
corp.rm.tag = c(),
stats = FALSE
)
# S4 method for kRp.text
tokens(
txt,
case.sens = FALSE,
lemmatize = FALSE,
corp.rm.class = "nonpunct",
corp.rm.tag = c()
)
# S4 method for character
types(
txt,
case.sens = FALSE,
lemmatize = FALSE,
corp.rm.class = "nonpunct",
corp.rm.tag = c(),
stats = FALSE,
lang = NULL
)
# S4 method for character
tokens(
txt,
case.sens = FALSE,
lemmatize = FALSE,
corp.rm.class = "nonpunct",
corp.rm.tag = c(),
lang = NULL
)
Only used for the method generic.
Logical, whether statistics on the length in characters and frequency of types in the text should also be returned.
Logical, whether types should be counted case sensitive. This option is available for tagged text and character input only.
Logical, whether analysis should be carried out on the lemmatized tokens rather than all running word forms. This option is available for tagged text and character input only.
A character vector with word classes which should be dropped. The default value
"nonpunct"
has special meaning and will cause the result of
kRp.POS.tags(lang, tags=c("punct","sentc"), list.classes=TRUE)
to be used.
This option is available for tagged text and character input only.
A character vector with POS tags which should be dropped. This option is available for tagged text and character input only.
Set the language of a text,
see the force.lang
option of lex.div
.
This option is available for character input only.
A character vector. Fortypes
and stats=TRUE
a data.frame containing all types,
their length (characters)
and frequency. The types
result is always sorted by frequency,
with more frequent types coming first.
# NOT RUN {
types(tagged.text)
tokens(tagged.text)
# }
Run the code above in your browser using DataLab