Learn R Programming

NLP (version 0.3-0)

tagsets: NLP Tag Sets

Description

Tag sets frequently used in Natural Language Processing.

Usage

Penn_Treebank_POS_tags
Brown_POS_tags
Universal_POS_tags
Universal_POS_tags_map

Arguments

Details

Penn_Treebank_POS_tags and Brown_POS_tags provide, respectively, the Penn Treebank POS tags (https://catalog.ldc.upenn.edu/docs/LDC95T7/cl93.html, Table 2) and the POS tags used for the Brown corpus (https://en.wikipedia.org/wiki/Brown_Corpus), both as data frames with the following variables:

entry

a character vector with the POS tags

description

a character vector with short descriptions of the tags

examples

a character vector with examples for the tags

Universal_POS_tags provides the universal POS tagset introduced by Slav Petrov, Dipanjan Das, and Ryan McDonald (tools:::Rd_expr_doi("10.48550/arXiv.1104.2086")), as a data frame with character variables entry and description.

Universal_POS_tags_map is a named list of mappings from language and treebank specific POS tagsets to the universal POS tags, with elements named en-ptb and en-brown giving the mappings, respectively, for the Penn Treebank and Brown POS tags.

Examples

Run this code
## Penn Treebank POS tags
dim(Penn_Treebank_POS_tags)
## Inspect first 20 entries:
write.dcf(head(Penn_Treebank_POS_tags, 20L))

## Brown POS tags
dim(Brown_POS_tags)
## Inspect first 20 entries:
write.dcf(head(Brown_POS_tags, 20L))

## Universal POS tags
Universal_POS_tags

## Available mappings to universal POS tags
names(Universal_POS_tags_map)

Run the code above in your browser using DataLab