Learn R Programming

quanteda (version 4.0.1)

concat: Return the concatenator character from an object

Description

Get the concatenator character from a tokens object.

Usage

concat(x)

concatenator(x)

Value

a character of length 1

Arguments

x

a tokens object

Details

The concatenator character is a special delimiter used to link separate tokens in multi-token phrases. It is embedded in the meta-data of tokens objects and used in downstream operations, such as tokens_compound() or tokens_lookup(). It can be extracted using concat() and set using tokens(x, concatenator = ...) when x is a tokens object.

The default _ is recommended since it will not be removed during normal cleaning and tokenization (while nearly all other punctuation characters, at least those in the Unicode punctuation class [P] will be removed).

Examples

Run this code
toks <- tokens(data_corpus_inaugural[1:5])
concat(toks)

Run the code above in your browser using DataLab