lex.div(txt, segment = 100, factor.size = 0.72,
rand.sample = 42, window = 100, case.sens = FALSE,
lemmatize = FALSE,
measure = c("TTR", "MSTTR", "MATTR", "C", "R", "CTTR", "U", "S", "K", "Maas", "HD-D", "MTLD"),
char = c("TTR", "MATTR", "C", "R", "CTTR", "U", "S", "K", "Maas", "HD-D", "MTLD"),
char.steps = 5, force.lang = NULL, keep.tokens = FALSE,
corp.rm.class = "nonpunct", corp.rm.tag = c(),
quiet = FALSE)
kRp.tagged-class
,
kRp.txt.freq-class
,
TRUE
all raw tokens
and types will be preserved in the resulting object, in a
slot called tt
. For the types, also their
frequency in the analyzed text will be listed."nonpunct"
has special meaning and will cause the
result of kRp.POS.tags(lang, c("punct","sentc"),
list.classes=TRUE)
to be used.FALSE
, short status
messages will be shown.kRp.TTR-class
.lex.div
calculates a variety of proposed indices
for lexical diversity. In the following formulae, $N$
refers to the total number of tokens, and $V$ to the
number of types: [object Object],[object Object],[object Object],[object Object],Wrapper function: C.ld
,[object Object],Wrapper function: R.ld
,[object Object],Wrapper function: CTTR
,[object Object],Wrapper function: U.ld
,[object Object],Wrapper function: S.ld
,[object Object],[object Object],[object Object],[object Object] By default, if the text has to be tagged yet, the
language definition is queried by calling
get.kRp.env(lang=TRUE)
internally. Or, if
txt
has already been tagged, by default the
language definition of that tagged object is read and
used. Set force.lang=get.kRp.env(lang=TRUE)
or to
any other valid value, if you want to forcibly overwrite
this default behaviour, and only then. See
kRp.POS.tags
for all
supported languages.
Maas, H.-D., (1972). "Uber den Zusammenhang zwischen Wortschatzumfang und L"ange eines Textes. Zeitschrift f"ur Literaturwissenschaft und Linguistik, 2(8), 73--96.
McCarthy, P.M. & Jarvis, S. (2007). vocd: A theoretical and empirical evaluation. Language Testing, 24(4), 459--488.
McCarthy, P.M. & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaces to lexical diversity assessment. Behaviour Research Methods, 42(2), 381--392.
Tweedie. F.J. & Baayen, R.H. (1998). How Variable May a Constant Be? Measures of Lexical Richness in Perspective. Computers and the Humanities, 32(5), 323--352.
kRp.POS.tags
,
kRp.tagged-class
,
kRp.TTR-class