Data sets used for mainly internal purposes by the quanteda package.
data_int_syllablesdata_char_wordlists
An object of class integer
of length 133245.
data_int_syllables
provides an English-language syllables dictionary; it is
an integer vector whose element names correspond to English words. Built from
the freely available CMU pronunciation dictionary at
http://www.speech.cs.cmu.edu/cgi-bin/cmudict
.
data_char_wordlists
provides word lists used in some readability indexes;
it is a named list of character vectors where each list element
corresponds to a different readability index.
These are:
DaleChall
The long Dale-Chall list of 3,000 familiar (English) words needed to compute the Dale-Chall Readability Formula.
Spache
The revised Spache word list (see Klare 1975, 73) needed to compute the Spache Revised Formula of readability (Spache 1974.
Chall, J. S., & Dale, E. 1995. Readability Revisited: The New Dale-Chall Readability Formula. Brookline Books.
Klare, G. R. 1975. "Assessing readability." Reading Research Quarterly 10(1): 62-102.
Spache, G. 1953. "A new readability formula for primary-grade reading materials." The Elementary School Journal 53: 410-413.