Learn R Programming

udpipe (version 0.8.11)

cbind_morphological: Add morphological features to an annotated dataset

Description

The result of udpipe_annotate which is put into a data.frame returns a field called feats containing morphological features as defined at https://universaldependencies.org/u/feat/index.html. If there are several of these features, these are concatenated with the | symbol. This function extracts each of these morphological features separately and adds these as extra columns to the data.frame

Usage

cbind_morphological(x, term = "feats", which)

Value

x in the same order with extra columns added (at least the column has_morph is added indicating if any morphological features are present and as well extra columns for each possible morphological feature in the data)

Arguments

x

a data.frame or data.table as returned by as.data.frame(udpipe_annotate(...))

term

the name of the field in x which contains the morphological features. Defaults to 'feats'.

which

a character vector with names of morphological features to uniquely parse out. These features are one of the 24 lexical and grammatical properties of words defined at https://universaldependencies.org/u/feat/index.html. Possible values are:

  • "lexical": "PronType", "NumType", "Poss", "Reflex", "Foreign", "Abbr", "Typo"

  • "inflectional_noun": "Gender", "Animacy", "NounClass", "Number", "Case", "Definite", "Degree"

  • "inflectional_verb": "VerbForm", "Mood", "Tense", "Aspect", "Voice", "Evident", "Polarity", "Person", "Polite", "Clusivity"

See the examples.

Examples

Run this code
if (FALSE) {
udmodel <- udpipe_download_model(language = "english-ewt")
udmodel <- udpipe_load_model(file = udmodel$file_model)
x <- udpipe_annotate(udmodel, 
                     x = "The economy is weak but the outlook is bright")
x <- as.data.frame(x)
x <- cbind_morphological(x, term = "feats")
}

f <- system.file(package = "udpipe", "dummydata", "traindata.conllu")
x <- udpipe_read_conllu(f)
x <- cbind_morphological(x, term = "feats")

f <- system.file(package = "udpipe", "dummydata", "traindata.conllu")
x <- udpipe_read_conllu(f)
x <- cbind_morphological(x, term = "feats", 
                         which = c("Mood", "Gender", "VerbForm", "Polarity", "Polite"))

# extract all features from the feats column even if not present in the data
f <- system.file(package = "udpipe", "dummydata", "traindata.conllu")
x <- udpipe_read_conllu(f)
x <- cbind_morphological(x, term = "feats", 
                         which = c("lexical", "inflectional_noun", "inflectional_verb"))

Run the code above in your browser using DataLab