tokens_split

Replaces tokens by multiple replacements consisting of elements split by a
separator pattern, with the option of retaining the separator. This function
effectively reverses the operation of <code>tokens_compound()</code>.

tokens

A fast, flexible, and comprehensive framework for
quantitative text analysis in R.  Provides functionality for corpus management,
creating and manipulating tokens and n-grams, exploring keywords in context,
forming and manipulating sparse matrices
of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and
distances, applying content dictionaries, applying supervised and unsupervised machine learning,
visually representing text and text analyses, and more.

Kenneth Benoit

quanteda

Quantitative Analysis of Textual Data

Kohei Watanabe

Haiyan Wang

Paul Nulty

Adam Obeng

Stefan Müller

Akitaka Matsuo

William Lowe

Christian Müller

Olivier Delmarcelle

European Research Council 

tokens_split function

<dl><dt>x</dt>
<dd>a tokens object</dd>
<dt>separator</dt>
<dd>a single-character pattern match by which tokens are separated</dd>
<dt>valuetype</dt>
<dd>the type of pattern matching: <code>"glob"</code> for "glob"-style
wildcard expressions; <code>"regex"</code> for regular expressions; or <code>"fixed"</code> for
exact matching. See valuetype for details.</dd>
<dt>remove_separator</dt>
<dd>if <code>TRUE</code>, remove separator from new tokens</dd>
<dt>apply_if</dt>
<dd>logical vector of length <code>ndoc(x)</code>; documents are modified
only when corresponding values are <code>TRUE</code>, others are left unchanged.</dd></dl>

Arguments

Split tokens by a separator pattern — tokens_split

<dl>

<dt>x</dt>
<dd>a tokens object</dd>


<dt>separator</dt>
<dd>a single-character pattern match by which tokens are separated</dd>


<dt>valuetype</dt>
<dd>the type of pattern matching: <code>"glob"</code> for "glob"-style
wildcard expressions; <code>"regex"</code> for regular expressions; or <code>"fixed"</code> for
exact matching. See valuetype for details.</dd>


<dt>remove_separator</dt>
<dd>if <code>TRUE</code>, remove separator from new tokens</dd>


<dt>apply_if</dt>
<dd>logical vector of length <code>ndoc(x)</code>; documents are modified
only when corresponding values are <code>TRUE</code>, others are left unchanged.</dd>

</dl>

Split tokens by a separator pattern

tokens_split: Split tokens by a separator pattern

Description

Usage

Arguments

Examples