Learn R Programming

corpora (version 0.6)

BNCcomparison: Comparison of written and spoken noun frequencies in the British National Corpus

Description

This data set compares the frequencies of 60 selected nouns in the written and spoken parts of the British National Corpus, World Edition (BNC). Nouns were chosen from three frequency bands, namely the 20 most frequent nouns in the corpus, 20 nouns with approximately 1000 occurrences, and 20 nouns with approximately 100 occurrences.

See Aston & Burnard (1998) for more information about the BNC, or go to http://www.natcorp.ox.ac.uk/.

Usage

BNCcomparison

Arguments

Format

A data frame with 61 rows and the following columns:

noun:

lemmatised noun (aka stem form)

written:

frequency in the written part of the BNC

spoken:

frequency in the spoken part of the BNC

Author

Stephanie Evert (https://purl.org/stephanie.evert)

Details

In addition to the 60 nouns, the data set contains a row labelled OTHER, which represents the total frequency of all other nouns in the BNC. This value is needed in order to calculate the sample sizes of the written and spoken part for frequency comparison tests.

References

Aston, Guy and Burnard, Lou (1998). The BNC Handbook. Edinburgh University Press, Edinburgh. See also the BNC homepage at http://www.natcorp.ox.ac.uk/.