This function grabs the table of dependencies from an annotation object. These are binary relationships between the tokens of a sentence. Common examples include nominal subject (linking the object of a sentence to a verb), and adjectival modifiers (linking an adjective to a noun). While not included in the underlying data, the function has an option for linking these dependencies to the raw words and lemmas in the table of tokens. Both language-agnostic and language-specific universal dependency types are included in the output.
cnlp_get_dependency(annotation, get_token = FALSE)
an annotation object
logical. Should words and lemmas be attached to the returned dependency table.
Returns an object of class c("tbl_df", "tbl", "data.frame")
containing one row for every dependency pair in the corpus.
The returned data frame includes at a minimum the following columns:
"id" - integer. Id of the source document.
"sid" - integer. Sentence id of the source token.
"tid" - integer. Id of the source token.
"tid_target" - integer. Id of the source token.
"relation" - character. Language-agnostic universal dependency type.
"relation_full" - character. Language specific universal dependency type.
If cnlp_get_token
is set to true, the following columns will also be
included:
"word" - character. The source word in the raw text.
"lemma" - character. Lemmatized form of the source word.
"word_target" - character. The target word in the raw text.
"lemma_target" - character. Lemmatized form of the target word.
Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60.
Danqi Chen and Christopher D Manning. 2014. A Fast and Accurate Dependency Parser using Neural Networks. In: Proceedings of EMNLP 2014
Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning. 2010. Multiword Expression Identification with Tree Substitution Grammars: A Parsing tour de force with French. In: EMNLP 2011.
Spence Green and Christopher D. Manning. 2010. Better Arabic Parsing: Baselines, Evaluations, and Analysis. In: COLING 2010.
Pi-Chuan Chang, Huihsin Tseng, Dan Jurafsky, and Christopher D. Manning. 2009. Discriminative Reordering with Chinese Grammatical Relations Features. In: Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation.
Anna Rafferty and Christopher D. Manning. 2008. Parsing Three German Treebanks: Lexicalized and Unlexicalized Baselines. In: ACL Workshop on Parsing German.
# NOT RUN {
data(obama)
# find the most common noun lemmas that are the syntactic subject of a
# clause
require(dplyr)
res <- cnlp_get_dependency(obama, get_token = TRUE) %>%
filter(relation == "nsubj")
res$lemma_target %>%
table() %>%
sort(decreasing = TRUE) %>%
head(n = 40)
# }
Run the code above in your browser using DataLab