The output/input arguments are passed by expression and support
quasiquotation; you can unquote strings and symbols.
format
Either "text", "man", "latex", "html", or "xml". If not text,
this uses the hunspell tokenizer, and can tokenize only by "word"
to_lower
Whether to convert tokens to lowercase. If tokens include
URLS (such as with token = "tweets"), such converted URLs may no
longer be correct.
drop
Whether original input column should get dropped. Ignored
if the original input and new output column have the same name.
collapse
Whether to combine text with newlines first in case tokens
(such as sentences or paragraphs) span multiple lines. If NULL, collapses
when token method is "ngrams", "skip_ngrams", "sentences", "lines",
"paragraphs", or "regex".