Learn R Programming

quanteda (version 0.99.22)

data-internal: internal data sets

Description

Data sets used for mainly internal purposes by the quanteda package.

Usage

data_int_syllables

data_char_stopwords

data_char_wordlists

Arguments

Format

An object of class integer of length 133245.

Details

data_int_syllables provides an English-language syllables dictionary; it is an integer vector whose element names correspond to English words. Built from the freely available CMU pronunciation dictionary at http://www.speech.cs.cmu.edu/cgi-bin/cmudict.

data_char_stopwords provides stopword lists in multiple languages; it is a named list of characters with the lowercase language name (in English) as the name of each list element. Supported languages are Arabic, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Norwegian, Portuguese, Russian, Spanish, and Swedish.

data_char_wordlists provides word lists used in some readability indexes; it is a named list of character vectors where each list element corresponds to a different readability index.

These are:

DaleChall

The long Dale-Chall list of 3,000 familiar (English) words needed to compute the Dale-Chall Readability Formula.

Spache

The revised Spache word list (see Klare 1975, 73) needed to compute the Spache Revised Formula of readability (Spache 1974.

References

Chall, J. S., & Dale, E. 1995. Readability Revisited: The New Dale-Chall Readability Formula. Brookline Books.

Klare, G. R. 1975. "Assessing readability." Reading Research Quarterly 10(1): 62-102.

Spache, G. 1953. "A new readability formula for primary-grade reading materials." The Elementary School Journal 53: 410-413.

See Also

stopwords