This data set contains a table of frequency counts obtained with a selection of BNCweb (Hoffmann et al. 2008) queries for each text document in the British National Corpus (Aston & Burnard 1998).
BNCqueries
A data frame with 4048 rows and 12 columns. The first column (id
) contains a character vector of
text IDs, the remaining columns contain integer vector of the corresponding per-text frequency counts for
various BNCweb queries. Column names ending in .S
indicate sentence counts rather than token counts.
The list below shows the BNCweb query used for each feature in CEQL syntax (Hoffmann et al. 2008, Ch. 6).
id
:text ID
split.inf.S
:number of sentences containing a split infinitive with -ly adverb; query: _TO0 +ly_AV0 _V?I
adv.inf.S
:number of sentences containing a non-split infinitive with -ly adverb; query: +ly_AV0 _TO0 _V?I
superlative.S
:number of sentences containing a superlative adjective; query: the (_AJS | most _AJ0)
past.S
:number of sentences containing a paste tense verb; query: _V?D
wh.question.S
:number of wh-questions; query: <s> _[PNQ,AVQ] _{V}
stop.to
:frequency of the expression stop to + verb; query: {stop/V} to _{V}
time
:frequency of the noun time; query: {time/N}
click
:frequency of the verb to click; query: {click/V}
noun
:frequency of common nouns; query: _NN?
nominalization
:frequency of nominalizations; query: +[tion,tions,ment,ments,ity,ities]_NN?
downtoner
:frequency of downtoners; query: [almost,barely,hardly,merely,mildly,nearly,only,partially,partly,practically,scarcely,slightly,somewhat]
Stephanie Evert (https://purl.org/stephanie.evert)
Aston, Guy and Burnard, Lou (1998). The BNC Handbook. Edinburgh University Press, Edinburgh. See also the BNC homepage at http://www.natcorp.ox.ac.uk/.
Hoffmann, Sebastian; Evert, Stefan; Smith, Nicholas; Lee, David; Berglund Prytz, Ylva (2008). Corpus Linguistics with BNCweb -- a Practical Guide, volume 6 of English Corpus Linguistics. Peter Lang, Frankfurt am Main. See also http://corpora.lancs.ac.uk/BNCweb/.
BNCmeta