Data sets used for mainly internal purposes by the quanteda package.
data_int_syllablesdata_char_stopwords
data_char_wordlists
An object of class integer
of length 133245.
data_int_syllables
provides an English-language syllables dictionary; it is
an integer vector whose element names correspond to English words. Built from
the freely available CMU pronunciation dictionary at
http://www.speech.cs.cmu.edu/cgi-bin/cmudict
.
data_char_stopwords
provides stopword lists in multiple
languages; it is a named list of characters with the lowercase language
name (in English) as the name of each list element.
Supported languages are Arabic, Danish, Dutch, English, Finnish, French,
German, Greek, Hungarian, Italian, Norwegian, Portuguese, Russian, Spanish,
and Swedish.
data_char_wordlists
provides word lists used in some readability indexes;
it is a named list of character vectors where each list element
corresponds to a different readability index.
These are:
DaleChall
The long Dale-Chall list of 3,000 familiar (English) words needed to compute the Dale-Chall Readability Formula.
Spache
The revised Spache word list (see Klare 1975, 73) needed to compute the Spache Revised Formula of readability (Spache 1974.
Chall, J. S., & Dale, E. 1995. Readability Revisited: The New Dale-Chall Readability Formula. Brookline Books.
Klare, G. R. 1975. "Assessing readability." Reading Research Quarterly 10(1): 62-102.
Spache, G. 1953. "A new readability formula for primary-grade reading materials." The Elementary School Journal 53: 410-413.