syllables: count syllables in a text

Description

This function takes a text and returns a count of the number of syllables it contains. For British English words, the syllable count is exact and looked up from the CMU pronunciation dictionary, from the default syllable dictionary englishSyllables. For any word not in the dictionary the syllable count is estimated by counting vowel clusters.

englishSyllables is a quanteda-supplied data object consisting of a named numeric vector of syllable counts for the words used as names. This is the default object used to count English syllables. This object that can be accessed directly, but we strongly encourage you to access it only through the syllables() wrapper function.

Usage

syllables(x, ...)
## S3 method for class 'character':
syllables(x, syllableDict = englishSyllables, ...)

Arguments

character vector or list of character vectors whose syllables will be counted

...

additional arguments passed to clean

syllableDict

a named numeric vector of syllable counts where the names are lower case tokens. The default is englishSyllables, an English pronunciation dictionary from CMU.

Value

numeric Named vector or list of counts of the number of syllables for each element of x. When a word is not available in the lookup table, its syllables are estimated by counting the number of (English) vowels in the word.

source

englishSyllables is built from the freely available CMU pronunciation dictionary at http://www.speech.cs.cmu.edu/cgi-bin/cmudict.

Examples

Run this code

syllables("This is an example sentence.")
syllables(tokenize("This is an example sentence.", simplify=TRUE))
myTexts <- c(text1 = "Text one.",
             text2 = "Superduper text number two.",
             text3 = "One more for the road.")
syllables(myTexts)
syllables("supercalifragilisticexpialidocious")

Run the code above in your browser using DataLab