Learn R Programming

quanteda (version 4.2.0)

tokens_chunk: Segment tokens object by chunks of a given size

Description

Segment tokens into new documents of equally sized token lengths, with the possibility of overlapping the chunks.

Usage

tokens_chunk(
  x,
  size,
  overlap = 0,
  use_docvars = TRUE,
  verbose = quanteda_options("verbose")
)

Value

A tokens object whose documents have been split into chunks of length size.

Arguments

x

tokens object whose token elements will be segmented into chunks

size

integer; the token length of the chunks

overlap

integer; the number of tokens in a chunk to be taken from the last overlap tokens from the preceding chunk

use_docvars

if TRUE, repeat the docvar values for each chunk; if FALSE, drop the docvars in the chunked tokens

verbose

if TRUE print the number of tokens and documents before and after the function is applied. The number of tokens does not include paddings.

See Also

tokens_segment()

Examples

Run this code
txts <- c(doc1 = "Fellow citizens, I am again called upon by the voice of
                  my country to execute the functions of its Chief Magistrate.",
          doc2 = "When the occasion proper for it shall arrive, I shall
                  endeavor to express the high sense I entertain of this
                  distinguished honor.")
toks <- tokens(txts)
tokens_chunk(toks, size = 5)
tokens_chunk(toks, size = 5, overlap = 4)

Run the code above in your browser using DataLab