textstat_lexdiv(x, measure = c("all", "TTR", "C", "R", "CTTR", "U", "S",
"Maas"), log.base = 10, drop = TRUE, ...)
TRUE
, the result is returned as a numeric vector if
only a single measure is requested; otherwise, a data.frame is returned
with each column consisting of a requested measure.textstat_lexdiv
calculates a variety of proposed indices for lexical
diversity. In the following formulae, \(N\) refers to the total number of
tokens, and \(V\) to the number of types: "TTR"
:"C"
:"R"
:"CTTR"
:"U"
:"S"
:"K"
:"Maas"
:Maas, H.-D., (1972). \"Uber den Zusammenhang zwischen Wortschatzumfang und L\"ange eines Textes. Zeitschrift f\"ur Literaturwissenschaft und Linguistik, 2(8), 73--96.
McCarthy, P.M. & Jarvis, S. (2007). vocd: A theoretical and empirical evaluation. Language Testing, 24(4), 459--488.
McCarthy, P.M. & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaces to lexical diversity assessment. Behaviour Research Methods, 42(2), 381--392.
Michalke, Meik. (2014) koRpus: An R Package for Text Analysis. Version 0.05-5. http://reaktanz.de/?c=hacking&s=koRpus
Tweedie. F.J. & Baayen, R.H. (1998). How Variable May a Constant Be? Measures of Lexical Richness in Perspective. Computers and the Humanities, 32(5), 323--352.
mydfm <- dfm(corpus_subset(data_corpus_inaugural, Year > 1980), verbose = FALSE)
(results <- textstat_lexdiv(mydfm, c("CTTR", "TTR", "U")))
cor(textstat_lexdiv(mydfm, "all"))
# with different settings of drop
textstat_lexdiv(mydfm, "TTR", drop = TRUE)
textstat_lexdiv(mydfm, "TTR", drop = FALSE)
Run the code above in your browser using DataLab