A multilingual text corpus of speeches from a European Parliament debate on coal subsidies in 2010, with individual crowd codings as the unit of observation. The sentences are drawn from officially translated speeches from a debate over a European Parliament debate concerning a Commission report proposing an extension to a regulation permitting state aid to uncompetitive coal mines.
Each speech is available in six languages: English, German, Greek, Italian, Polish and Spanish. The unit of observation is the individual crowd coding of each natural sentence. For more information on the coding approach see Benoit et al. (2016).
The corpus consists of 16,806 documents (i.e. codings of a sentence) and includes the following document-level variables:
character; a unique identifier for each sentence
factor; whether a coder labelled the sentence as "Pro-Subsidy", "Anti-Subsidy" or "Neutral or inapplicable"
factor; the language (translation) of the speech
character; speaker's last name
character; speaker's first name
factor; abbreviation of the EP party group of the speaker
factor; the speaker's country of origin
factor; the speaker's vote on the proposal (For/Against/Abstain/NA)
character; a unique identifier for each crowd coder
numeric; the "trust score" from the Crowdflower platform used to code the sentences, which can theoretically range between 0 and 1. Only coders with trust scores above 0.8 are included in the corpus.
A corpus object.
Benoit, K., Conway, D., Lauderdale, B.E., Laver, M., & Mikhaylov, S. (2016). Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data. American Political Science Review, 100,(2), 278--295. tools:::Rd_expr_doi("10.1017/S0003055416000058")