tokens_chunk

Segment tokens into new documents of equally sized token lengths, with the
possibility of overlapping the chunks.

tokens

A fast, flexible, and comprehensive framework for
quantitative text analysis in R.  Provides functionality for corpus management,
creating and manipulating tokens and n-grams, exploring keywords in context,
forming and manipulating sparse matrices
of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and
distances, applying content dictionaries, applying supervised and unsupervised machine learning,
visually representing text and text analyses, and more.

Kenneth Benoit

quanteda

Quantitative Analysis of Textual Data

Kohei Watanabe

Haiyan Wang

Paul Nulty

Adam Obeng

Stefan Müller

Akitaka Matsuo

William Lowe

Christian Müller

Olivier Delmarcelle

European Research Council 

tokens_chunk function

<dl><dt>x</dt>
<dd>tokens object whose token elements will be segmented into
chunks</dd>
<dt>size</dt>
<dd>integer; the token length of the chunks</dd>
<dt>overlap</dt>
<dd>integer; the number of tokens in a chunk to be taken from the
last <code>overlap</code> tokens from the preceding chunk</dd>
<dt>use_docvars</dt>
<dd>if <code>TRUE</code>, repeat the docvar values for each chunk;
if <code>FALSE</code>, drop the docvars in the chunked tokens</dd>
<dt>verbose</dt>
<dd>if <code>TRUE</code> print the number of tokens and documents before and
after the function is applied. The number of tokens does not include paddings.</dd></dl>

Arguments

Segment tokens object by chunks of a given size — tokens_chunk

<dl>

<dt>x</dt>
<dd>tokens object whose token elements will be segmented into
chunks</dd>


<dt>size</dt>
<dd>integer; the token length of the chunks</dd>


<dt>overlap</dt>
<dd>integer; the number of tokens in a chunk to be taken from the
last <code>overlap</code> tokens from the preceding chunk</dd>


<dt>use_docvars</dt>
<dd>if <code>TRUE</code>, repeat the docvar values for each chunk;
if <code>FALSE</code>, drop the docvars in the chunked tokens</dd>


<dt>verbose</dt>
<dd>if <code>TRUE</code> print the number of tokens and documents before and
after the function is applied. The number of tokens does not include paddings.</dd>

</dl>

Segment tokens object by chunks of a given size

tokens_chunk: Segment tokens object by chunks of a given size

Description

Usage

Value

Arguments

See Also

Examples