Learn R Programming

corpora (version 0.6)

BNCdomains: Distribution of domains in the British National Corpus (BNC)

Description

This data set gives the number of documents and tokens in each of the 18 domains represented in the British National Corpus, World Edition (BNC). See Aston & Burnard (1998) for more information about the BNC and the domain classification, or go to http://www.natcorp.ox.ac.uk/.

Usage

BNCdomains

Arguments

Format

A data frame with 19 rows and the following columns:

domain:

name of the respective domain in the BNC

documents:

number of documents from this domain

tokens:

total number of tokens in all documents from this domain

Author

Marco Baroni <baroni@sslmit.unibo.it>

Details

For one document in the BNC, the domain classification is missing. This document is represented by the code Unlabeled in the data set.

References

Aston, Guy and Burnard, Lou (1998). The BNC Handbook. Edinburgh University Press, Edinburgh. See also the BNC homepage at http://www.natcorp.ox.ac.uk/.