This data set, from Efron and Thisted (1976),
gives the number of distinct words types (Freq
) of words
that appeared exactly once, twice, etc. up to 100 times (count
)
in the complete works of Shakespeare. In these works, Shakespeare
used 31,534 distinct words (types), comprising 884,647 words in total.
Efron & Thisted used this data to ask the question, "How many
words did Shakespeare know?" Put another way, suppose another
new corpus of works Shakespeare were discovered, also with
884,647 words. How many new word types would appear?
The answer to the main question involves contemplating
an infinite number of such new corpora.