This data set compares the frequencies of 60 selected nouns in the written and spoken parts of the British National Corpus, World Edition (BNC). Nouns were chosen from three frequency bands, namely the 20 most frequent nouns in the corpus, 20 nouns with approximately 1000 occurrences, and 20 nouns with approximately 100 occurrences.
See Aston & Burnard (1998) for more information about the BNC, or go to http://www.natcorp.ox.ac.uk/.
BNCcomparison
A data frame with 61 rows and the following columns:
noun
:lemmatised noun (aka stem form)
written
:frequency in the written part of the BNC
spoken
:frequency in the spoken part of the BNC
Stephanie Evert (https://purl.org/stephanie.evert)
In addition to the 60 nouns, the data set contains a row labelled
OTHER
, which represents the total frequency of all other nouns
in the BNC. This value is needed in order to calculate the sample
sizes of the written and spoken part for frequency comparison tests.
Aston, Guy and Burnard, Lou (1998). The BNC Handbook. Edinburgh University Press, Edinburgh. See also the BNC homepage at http://www.natcorp.ox.ac.uk/.