The output/input arguments are passed by expression and support
quasiquotation; you can unquote strings and symbols.
strip_punct
Should punctuation be stripped?
strip_url
Should URLs (starting with http(s)) be preserved intact, or
removed entirely?
format
Either "text", "man", "latex", "html", or "xml". If not text,
this uses the hunspell tokenizer, and can tokenize only by "word"
to_lower
Whether to convert tokens to lowercase. If tokens include
URLS (such as with token = "tweets"), such converted URLs may no
longer be correct.
drop
Whether original input column should get dropped. Ignored
if the original input and new output column have the same name.
collapse
Whether to combine text with newlines first in case tokens
(such as sentences or paragraphs) span multiple lines. If NULL, collapses
when token method is "ngrams", "skip_ngrams", "sentences", "lines",
"paragraphs", or "regex".