lex.div(txt, segment = 100, factor.size = 0.72,
rand.sample = 42, window = 100, case.sens = FALSE,
lemmatize = FALSE,
measure = c("TTR", "MSTTR", "MATTR", "C", "R", "CTTR", "U", "S", "K", "Maas", "HD-D", "MTLD"),
char = c("TTR", "MATTR", "C", "R", "CTTR", "U", "S", "K", "Maas", "HD-D", "MTLD"),
char.steps = 5, force.lang = NULL, keep.tokens = FALSE,
corp.rm.class = "nonpunct", corp.rm.tag = c(),
quiet = FALSE)kRp.tagged-class,
kRp.txt.freq-class,
TRUE all raw tokens
and types will be preserved in the resulting object, in a
slot called tt. For the types, also their
frequency in the analyzed text will be listed."nonpunct" has special meaning and will cause the
result of kRp.POS.tags(lang, c("punct","sentc"),
list.classes=TRUE) to be used.FALSE, short status
messages will be shown.kRp.TTR-class.lex.div calculates a variety of proposed indices
for lexical diversity. In the following formulae, $N$
refers to the total number of tokens, and $V$ to the
number of types: [object Object],[object Object],[object Object],[object Object],Wrapper function: C.ld,[object Object],Wrapper function: R.ld,[object Object],Wrapper function: CTTR,[object Object],Wrapper function: U.ld,[object Object],Wrapper function: S.ld,[object Object],[object Object],[object Object],[object Object] By default, if the text has to be tagged yet, the
language definition is queried by calling
get.kRp.env(lang=TRUE) internally. Or, if
txt has already been tagged, by default the
language definition of that tagged object is read and
used. Set force.lang=get.kRp.env(lang=TRUE) or to
any other valid value, if you want to forcibly overwrite
this default behaviour, and only then. See
kRp.POS.tags for all
supported languages.
Maas, H.-D., (1972). "Uber den Zusammenhang zwischen Wortschatzumfang und L"ange eines Textes. Zeitschrift f"ur Literaturwissenschaft und Linguistik, 2(8), 73--96.
McCarthy, P.M. & Jarvis, S. (2007). vocd: A theoretical and empirical evaluation. Language Testing, 24(4), 459--488.
McCarthy, P.M. & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaces to lexical diversity assessment. Behaviour Research Methods, 42(2), 381--392.
Tweedie. F.J. & Baayen, R.H. (1998). How Variable May a Constant Be? Measures of Lexical Richness in Perspective. Computers and the Humanities, 32(5), 323--352.
kRp.POS.tags,
kRp.tagged-class,
kRp.TTR-class