Learn R Programming

corpora (version 0.6)

BNCbiber: Biber's (1988) register features for the British National Corpus

Description

This data set contains a table of the relative frequencies (per 1000 words) of 65 linguistic features (Biber 1988, 1995) for each text document in the British National Corpus (Aston & Burnard 1998).

Biber (1988) introduced these features for the purpose of a multidimensional register analysis. Variables in the data set are numbered according to Biber's list (see e.g. Biber 1995, 95f).

Feature frequencies were automatically extracted from the British National Corpus using query patterns based on part-of-speech tags (Gasthaus 2007). Note that features 60 and 65 had to be omitted because they cannot be identified with sufficient accuracy by the automatic methods. For further information on the extraction methodology, see Gasthaus (2007, 20-21). The original data set and the Python scripts used for feature extraction are available from https://portal.ikw.uni-osnabrueck.de/~CL/download/BSc_Gasthaus2007/; the version included here contains some bug fixes.

Usage

BNCbiber

Arguments

Format

A numeric matrix with 4048 rows and 65 columns, specifying the relative frequencies (per 1000 words) of 65 linguistic features. Documents are listed in the same order as the metadata in BNCmeta and rows are labelled with text IDs, so it is straightforward to combine the two data sets.

A. Tense and aspect markers
f_01_past_tensePast tense
f_02_perfect_aspectPerfect aspect
f_03_present_tensePresent tense
B. Place and time adverbials
f_04_place_adverbialsPlace adverbials (e.g., above, beside, outdoors)
f_05_time_adverbialsTime adverbials (e.g., early, instantly, soon)
C. Pronouns and pro-verbs
f_06_first_person_pronounsFirst-person pronouns
f_07_second_person_pronounsSecond-person pronouns
f_08_third_person_pronounsThird-person personal pronouns (excluding it)
f_09_pronoun_itPronoun it
f_10_demonstrative_pronounDemonstrative pronouns (that, this, these, those as pronouns)
f_11_indefinite_pronounIndefinite pronounes (e.g., anybody, nothing, someone)
f_12_proverb_doPro-verb do
D. Questions
f_13_wh_questionDirect wh-questions
E. Nominal forms
f_14_nominalizationNominalizations (ending in -tion, -ment, -ness, -ity)
f_15_gerundsGerunds (participial forms functioning as nouns)
f_16_other_nounsTotal other nouns
F. Passives
f_17_agentless_passivesAgentless passives
f_18_by_passivesby-passives
G. Stative forms
f_19_be_main_verbbe as main verb
f_20_existential_thereExistential there
H. Subordination features
f_21_that_verb_compthat verb complements (e.g., I said that he went.)
f_22_that_adj_compthat adjective complements (e.g., I'm glad that you like it.)
f_23_wh_clausewh-clauses (e.g., I believed what he told me.)
f_24_infinitivesInfinitives
f_25_present_participlePresent participial adverbial clauses (e.g., Stuffing his mouth with cookies, Joe ran out the door.)
f_26_past_participlePast participial adverbial clauses (e.g., Built in a single week, the house would stand for fifty years.)
f_27_past_participle_whizPast participial postnominal (reduced relative) clauses (e.g., the solution produced by this process)
f_28_present_participle_whizPresent participial postnominal (reduced relative) clauses (e.g., the event causing this decline)
f_29_that_subjthat relative clauses on subject position (e.g., the dog that bit me)
f_30_that_objthat relative clauses on object position (e.g., the dog that I saw)
f_31_wh_subjwh relatives on subject position (e.g., the man who likes popcorn)
f_32_wh_objwh relatives on object position (e.g., the man who Sally likes)
f_33_pied_pipingPied-piping relative clauses (e.g., the manner in which he was told)
f_34_sentence_relativesSentence relatives (e.g., Bob likes fried mangoes, which is the most disgusting thing I've ever heard of.)
f_35_becauseCausative adverbial subordinator (because)
f_36_thoughConcessive adverbial subordinators (although, though)
f_37_ifConditional adverbial subordinators (if, unless)
f_38_other_adv_subOther adverbial subordinators (e.g., since, while, whereas)
I. Prepositional phrases, adjectives and adverbs
f_39_prepositionsTotal prepositional phrases
f_40_adj_attrAttributive adjectives (e.g., the big horse)
f_41_adj_predPredicative adjectives (e.g., The horse is big.)
f_42_adverbsTotal adverbs
J. Lexical specificity
f_43_type_tokenType-token ratio (including punctuation)
f_44_mean_word_lengthAverage word length (across tokens, excluding punctuation)
K. Lexical classes
f_45_conjunctsConjuncts (e.g., consequently, furthermore, however)
f_46_downtonersDowntoners (e.g., barely, nearly, slightly)
f_47_hedgesHedges (e.g., at about, something like, almost)
f_48_amplifiersAmplifiers (e.g., absolutely, extremely, perfectly)
f_49_emphaticsEmphatics (e.g., a lot, for sure, really)
f_50_discourse_particlesDiscourse particles (e.g., sentence-initial well, now, anyway)
f_51_demonstrativesDemonstratives
L. Modals
f_52_modal_possibilityPossibility modals (can, may, might, could)
f_53_modal_necessityNecessity modals (ought, should, must)
f_54_modal_predictivePredictive modals (will, would, shall)
M. Specialized verb classes
f_55_verb_publicPublic verbs (e.g., assert, declare, mention)
f_56_verb_privatePrivate verbs (e.g., assume, believe, doubt, know)
f_57_verb_suasiveSuasive verbs (e.g., command, insist, propose)
f_58_verb_seemseem and appear
N. Reduced forms and dispreferred structures
f_59_contractionsContractions
n/aSubordinator that deletion (e.g., I think [that] he went.)
f_61_stranded_prepositionStranded prepositions (e.g., the candidate that I was thinking of)
f_62_split_infinitveSplit infinitives (e.g., He wants to convincingly prove that ...)
f_63_split_auxiliarySplit auxiliaries (e.g., They were apparently shown to ...)
O. Co-ordination
f_64_phrasal_coordinationPhrasal co-ordination (N and N; Adj and Adj; V and V; Adv and Adv)
n/aIndependent clause co-ordination (clause-initial and)
P. Negation
f_66_neg_syntheticSynthetic negation (e.g., No answer is good enough for Jones.)
f_67_neg_analyticAnalytic negation (e.g., That's not likely.)

Author

Stephanie Evert (https://purl.org/stephanie.evert); feature extractor by Jan Gasthaus (2007).

References

Aston, Guy and Burnard, Lou (1998). The BNC Handbook. Edinburgh University Press, Edinburgh. See also the BNC homepage at http://www.natcorp.ox.ac.uk/.

Biber, Douglas (1988). Variations Across Speech and Writing. Cambridge University Press, Cambridge.

Biber, Douglas (1995). Dimensions of Register Variation: A cross-linguistic comparison. Cambridge University Press, Cambridge.

Gasthaus, Jan (2007). Prototype-Based Relevance Learning for Genre Classification. B.Sc.\ thesis, Institute of Cognitive Science, University of Osnabrück. Data sets and software available from https://portal.ikw.uni-osnabrueck.de/~CL/download/BSc_Gasthaus2007/.

See Also

BNCmeta