This data set gives the number of documents and tokens in each of the 18 domains represented in the British National Corpus, World Edition (BNC). See Aston & Burnard (1998) for more information about the BNC and the domain classification, or go to http://www.natcorp.ox.ac.uk/.
BNCdomains
A data frame with 19 rows and the following columns:
domain
:name of the respective domain in the BNC
documents
:number of documents from this domain
tokens
:total number of tokens in all documents from this domain
Marco Baroni <baroni@sslmit.unibo.it>
For one document in the BNC, the domain classification is missing.
This document is represented by the code Unlabeled
in the data
set.
Aston, Guy and Burnard, Lou (1998). The BNC Handbook. Edinburgh University Press, Edinburgh. See also the BNC homepage at http://www.natcorp.ox.ac.uk/.