Learn R Programming

NLP (version 0.2-0)

tagsets: NLP Tag Sets

Description

Tag sets frequently used in Natural Language Processing.

Usage

Penn_Treebank_POS_tags
Brown_POS_tags
Universal_POS_tags
Universal_POS_tags_map

Arguments

Details

Penn_Treebank_POS_tags and Brown_POS_tags provide, respectively, the Penn Treebank POS tags (https://catalog.ldc.upenn.edu/docs/LDC95T7/cl93.html, Table 2) and the POS tags used for the Brown corpus (http://www.hit.uib.no/icame/brown/bcm.html), both as data frames with the following variables:

entry

a character vector with the POS tags

description

a character vector with short descriptions of the tags

examples

a character vector with examples for the tags

Universal_POS_tags provides the universal POS tagset introduced by Slav Petrov, Dipanjan Das, and Ryan McDonald (https://arxiv.org/abs/1104.2086), as a data frame with character variables entry and description.

Universal_POS_tags_map is a named list of mappings from language and treebank specific POS tagsets to the universal POS tags, with elements named en-ptb and en-brown giving the mappings, respectively, for the Penn Treebank and Brown POS tags.

Examples

Run this code
# NOT RUN {
## Penn Treebank POS tags
dim(Penn_Treebank_POS_tags)
## Inspect first 20 entries:
write.dcf(head(Penn_Treebank_POS_tags, 20L))

## Brown POS tags
dim(Brown_POS_tags)
## Inspect first 20 entries:
write.dcf(head(Brown_POS_tags, 20L))

## Universal POS tags
Universal_POS_tags

## Available mappings to universal POS tags
names(Universal_POS_tags_map)
# }

Run the code above in your browser using DataLab