The concatenator character is a special delimiter used to link
separate tokens in multi-token phrases. It is embedded in the meta-data of
tokens objects and used in downstream operations, such as tokens_compound()
or tokens_lookup(). It can be extracted using concat() and set using
tokens(x, concatenator = ...) when x is a tokens object.
The default _ is recommended since it will not be removed during normal
cleaning and tokenization (while nearly all other punctuation characters, at
least those in the Unicode punctuation class [P] will be removed).