Learn R Programming

corpora (version 0.6)

BNCmeta: Metadata for the British National Corpus (XML edition)

Description

This data set provides complete metadata for all 4048 texts of the British National Corpus (XML edition). See Aston & Burnard (1998) for more information about the BNC, or go to http://www.natcorp.ox.ac.uk/.

The data have automatically been extracted from the original BNC source files. Some transformations were applied so that all attribute names and their values are given in a human-readable form. The Perl scripts used in the extraction procedure are available from https://cwb.sourceforge.io/install.php#other.

Usage

BNCmeta

Arguments

Format

A data frame with 4048 rows and the columns listed below. Unless specified otherwise, columns are coded as factors.

id:

BNC document ID; character vector

title:

Title of the document; character vector

n_words:

Number of words in the document; integer vector

n_tokens:

Total number of tokens (including punctuation and deleted material); integer vector

n_w:

Number of w-units (words); integer vector

n_c:

Number of c-units (punctuation); integer vector

n_s:

Number of s-units (sentences); integer vector

publication_date:

Publication date

text_type:

Text type

context:

Spoken context

respondent_age:

Age-group of respondent

respondent_class:

Social class of respondent (NRS social grades)

respondent_sex:

Sex of respondent

interaction_type:

Interaction type

region:

Region

author_age:

Author age-group

author_domicile:

Domicile of author

author_sex:

Sex of author

author_type:

Author type

audience_age:

Audience age

domain:

Written domain

difficulty:

Written difficulty

medium:

Written medium

publication_place:

Publication place

sampling_type:

Sampling type

circulation:

Estimated circulation size

audience_sex:

Audience sex

availability:

Availability

mode:

Text mode (written/spoken)

derived_type:

Text class

genre:

David Lee's genre classification

Author

Stephanie Evert (https://purl.org/stephanie.evert)

References

Aston, Guy and Burnard, Lou (1998). The BNC Handbook. Edinburgh University Press, Edinburgh. See also the BNC homepage at http://www.natcorp.ox.ac.uk/.