Learn R Programming

quanteda (version 0.9.8.5)

lexdiv: calculate lexical diversity

Description

Calculate the lexical diversity or complexity of text(s).

Usage

lexdiv(x, ...)
"lexdiv"(x, measure = c("all", "TTR", "C", "R", "CTTR", "U", "S", "Maas"), log.base = 10, drop = TRUE, ...)

Arguments

x
an input object, such as a document-feature matrix object
...
not used
measure
a character vector defining the measure to calculate.
log.base
a numeric value defining the base of the logarithm (for measures using logs)
drop
if TRUE, the result is returned as a numeric vector if only a single measure is requested; otherwise, a data.frame is returned with each column consisting of a requested measure.

Value

a data.frame or vector of lexical diversity statistics, each row or vector element corresponding to an input document

Details

lexdiv calculates a variety of proposed indices for lexical diversity. In the following formulae, $N$ refers to the total number of tokens, and $V$ to the number of types:

References

Covington, M.A. & McFall, J.D. (2010). Cutting the Gordian Knot: The Moving-Average Type-Token Ratio (MATTR). Journal of Quantitative Linguistics, 17(2), 94--100. Maas, H.-D., (1972). \"Uber den Zusammenhang zwischen Wortschatzumfang und L\"ange eines Textes. Zeitschrift f\"ur Literaturwissenschaft und Linguistik, 2(8), 73--96. McCarthy, P.M. & Jarvis, S. (2007). vocd: A theoretical and empirical evaluation. Language Testing, 24(4), 459--488. McCarthy, P.M. & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaces to lexical diversity assessment. Behaviour Research Methods, 42(2), 381--392. Michalke, Meik. (2014) koRpus: An R Package for Text Analysis. Version 0.05-5. http://reaktanz.de/?c=hacking&s=koRpus Tweedie. F.J. & Baayen, R.H. (1998). How Variable May a Constant Be? Measures of Lexical Richness in Perspective. Computers and the Humanities, 32(5), 323--352.

Examples

Run this code
mydfm <- dfm(subset(inaugCorpus, Year > 1980), verbose = FALSE)
(results <- lexdiv(mydfm, c("CTTR", "TTR", "U")))
cor(lexdiv(mydfm, "all"))

# with different settings of drop
lexdiv(mydfm, "TTR", drop = TRUE)
lexdiv(mydfm, "TTR", drop = FALSE)

Run the code above in your browser using DataLab