Learn R Programming

udpipe (version 0.8.3)

as_phrasemachine: Convert Parts of Speech tags to one-letter tags which can be used to identify phrases based on regular expressions

Description

Noun phrases are of common interest when doing natural language processing. Extracting noun phrases from text can be done easily by defining a sequence of Parts of Speech tags. For example this sequence of POS tags can be seen as a noun phrase: Adjective, Noun, Preposition, Noun. This function recodes Universal POS tags to one of the following 1-letter tags, in order to simplify writing regular expressions to find Parts of Speech sequences:

  • A: adjective

  • C: coordinating conjuction

  • D: determiner

  • M: modifier of verb

  • N: noun or proper noun

  • P: preposition

  • O: other elements

After which identifying a simple noun phrase can be just expressed by using the following regular expression (A|N)*N(P+D*(A|N)*N)* which basically says start with adjective or noun, another noun, a preposition, determiner adjective or noun and next a noun again.

Usage

as_phrasemachine(x, type = c("upos", "penn-treebank"))

Arguments

x

a character vector of POS tags for example by using udpipe_annotate

type

either 'upos' or 'penn-treebank' indicating to recode Universal Parts of Speech tags to the counterparts as described in the description, or to recode Parts of Speech tags as known in the Penn Treebank to the counterparts as described in the description

Value

the character vector x where the respective POS tags are replaced with one-letter tags

Details

For more information on extracting phrases see http://brenocon.com/handler2016phrases.pdf

See Also

phrases

Examples

Run this code
# NOT RUN {
x <- c("PROPN", "SCONJ", "ADJ", "NOUN", "VERB", "INTJ", "DET", "VERB", 
       "PROPN", "AUX", "NUM", "NUM", "X", "SCONJ", "PRON", "PUNCT", "ADP", 
       "X", "PUNCT", "AUX", "PROPN", "ADP", "X", "PROPN", "ADP", "DET", 
       "CCONJ", "INTJ", "NOUN", "PROPN")
as_phrasemachine(x)
# }

Run the code above in your browser using DataLab