cnlp_get_dependency: Access dependencies from an annotation object

Description

This function grabs the table of dependencies from an annotation object. These are binary relationships between the tokens of a sentence. Common examples include nominal subject (linking the object of a sentence to a verb), and adjectival modifiers (linking an adjective to a noun). While not included in the underlying data, the function has an option for linking these dependencies to the raw words and lemmas in the table of tokens. Both language-agnostic and language-specific universal dependency types are included in the output.

Usage

cnlp_get_dependency(annotation, get_token = FALSE)

Arguments

annotation

an annotation object

get_token

logical. Should words and lemmas be attached to the returned dependency table.

Value

Returns an object of class c("tbl_df", "tbl", "data.frame") containing one row for every dependency pair in the corpus.

The returned data frame includes at a minimum the following columns:

"id" - integer. Id of the source document.
"sid" - integer. Sentence id of the source token.
"tid" - integer. Id of the source token.
"tid_target" - integer. Id of the source token.
"relation" - character. Language-agnostic universal dependency type.
"relation_full" - character. Language specific universal dependency type.

If cnlp_get_token is set to true, the following columns will also be included:

"word" - character. The source word in the raw text.
"lemma" - character. Lemmatized form of the source word.
"word_target" - character. The target word in the raw text.
"lemma_target" - character. Lemmatized form of the target word.

References

Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60.

Danqi Chen and Christopher D Manning. 2014. A Fast and Accurate Dependency Parser using Neural Networks. In: Proceedings of EMNLP 2014

Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning. 2010. Multiword Expression Identification with Tree Substitution Grammars: A Parsing tour de force with French. In: EMNLP 2011.

Spence Green and Christopher D. Manning. 2010. Better Arabic Parsing: Baselines, Evaluations, and Analysis. In: COLING 2010.

Pi-Chuan Chang, Huihsin Tseng, Dan Jurafsky, and Christopher D. Manning. 2009. Discriminative Reordering with Chinese Grammatical Relations Features. In: Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation.

Anna Rafferty and Christopher D. Manning. 2008. Parsing Three German Treebanks: Lexicalized and Unlexicalized Baselines. In: ACL Workshop on Parsing German.

Examples

Run this code

# NOT RUN {
data(obama)

# find the most common noun lemmas that are the syntactic subject of a
# clause
require(dplyr)
res <- cnlp_get_dependency(obama, get_token = TRUE) %>%
  filter(relation == "nsubj")
res$lemma_target %>%
  table() %>%
  sort(decreasing = TRUE) %>%
  head(n = 40)

# }

Run the code above in your browser using DataLab