tokenizers

word_tokenizer

char_tokenizer

space_tokenizer

strings

other parameters (usually not used - see source code for details).

<code>character</code>, <code>nchar(sep)</code> = 1 - split strings by this character.

<code>logical</code> tokenize at C++ level - could speed-up by 15-50%.

xptr

Few simple tokenization functions. For more comprehensive list see <code>tokenizers</code> package:
<a href="https://cran.r-project.org/package=tokenizers">https://cran.r-project.org/package=tokenizers</a>.
Also check <code>stringi::stri_split_*</code>.

Fast and memory-friendly tools for text vectorization, topic
modeling (LDA, LSA), word embeddings (GloVe), similarities. This package
provides a source-agnostic streaming API, which allows researchers to perform
analysis of collections of documents which are larger than available RAM. All
core functions are parallelized to benefit from multicore machines.

Dmitriy Selivanov

text2vec

Modern Text Mining Framework for R

Qing Wang

tokenizers function

Few simple tokenization functions. For more comprehensive list see <code>tokenizers</code> package:
<a href='https://cran.r-project.org/package=tokenizers'>https://cran.r-project.org/package=tokenizers</a>.
Also check <code>stringi::stri_split_*</code>.

tokenizers: Simple tokenization functions for string splitting

Description

Usage

Arguments

Value

Examples