Learn R Programming

corpora (version 0.6)

KrennPPV: German PP-Verb collocation candidates annotated by Brigitte Krenn (2000)

Description

This data set lists 5102 frequent combinations of verbs and prepositional phrases (PP) extracted from a German newspaper corpus. The collocational status of each PP-verb combination was manually annotated by Brigitte Krenn (2000). In addition, pre-computed scores of several standard association measures are provided.

The KrennPPV candidate set forms part of the data used in the evaluation study of Evert & Krenn (2005).

Usage

KrennPPV

Arguments

Format

A data frame with 5102 rows and the following columns:

PP:

the prepositional phrase, represented by preposition and lemma of the nominal head (character). Preposition-article fusion is indicated by a + sign. For example, the prepositional phrase im letzten Jahr would appear as in:Jahr in the data set.

verb:

the verb lemma (character). Separated particle verbs have been recombined.

is.colloc:

whether the PP-verb combination is a lexical collocation (logical)

is.SVC:

whether a PP-verb collocation is a support verb construction (logical)

is.figur:

whether a PP-verb-collocation is a figurative expression (logical)

freq:

co-occurrence frequency of the PP-verb combination within clauses (integer)

MI:

Mutual Information association measure

Dice:

Dice coefficient association measure

z.score:

z-score association measure

t.score:

t-score association measure

chisq:

chi-squared association measure (without Yates' continuity correction)

chisq.corr:

chi-squared association measure (with Yates' continuity correction)

log.like:

log-likelihood association measure

Fisher:

Fisher's exact test as an association measure (negative logarithm of one-sided p-value)

See Evert (2008) and http://www.collocations.de/AM/ for details on these association measures.

Author

Stephanie Evert (https://purl.org/stephanie.evert)

References

Evert, Stefan (2008). Corpora and collocations. In A. Lüdeling and M. Kytö (eds.), Corpus Linguistics. An International Handbook, chapter 58, pages 1212--1248. Mouton de Gruyter, Berlin, New York.

Evert, Stefan and Krenn, Brigitte (2005). Using small random samples for the manual evaluation of statistical association measures. Computer Speech and Language, 19(4), 450--466.

Krenn, Brigitte (2000). The Usual Suspects: Data-Oriented Models for the Identification and Representation of Lexical Collocations, volume~7 of Saarbrücken Dissertations in Computational Linguistics and Language Technology. DFKI & Universität des Saarlandes, Saarbrücken, Germany.