splitter

A utility function for use with n-gram modeling. This function
splits a string based on various options.

preprocessing

An n-gram is a sequence of n "words" taken, in order, from a
body of text.  This is a collection of utilities for creating,
displaying, summarizing, and "babbling" n-grams.  The
'tokenization' and "babbling" are handled by very efficient C
code, which can even be built as its own standalone library.
The babbler is a simple Markov chain.  The package also offers
a vignette with complete example 'workflows' and information about
the utilities offered in the package.

Drew Schmidt

ngram

Fast n-Gram 'Tokenization'

Christian Heckendorf

splitter function

<dl><dt>string</dt>
<dd>An input string.</dd>
<dt>split.char</dt>
<dd>Logical; should a split occur after every character?</dd>
<dt>split.space</dt>
<dd>Logical; determines if spaces should be preserved as characters in
the n-gram tokenization. The character(s) used for spaces are
determined by the <code>spacesep</code> argument.
characters.</dd>
<dt>spacesep</dt>
<dd>The character(s) to represent a space in the case that
<code>split.space=TRUE</code>. Should not just be a space(s).</dd>
<dt>split.punct</dt>
<dd>Logical; determines if splits should occur at punctuation.</dd></dl>

Arguments

Character Splitter — splitter

<dl>

<dt>string</dt>
<dd>An input string.</dd>


<dt>split.char</dt>
<dd>Logical; should a split occur after every character?</dd>


<dt>split.space</dt>
<dd>Logical; determines if spaces should be preserved as characters in
the n-gram tokenization. The character(s) used for spaces are
determined by the <code>spacesep</code> argument.
characters.</dd>


<dt>spacesep</dt>
<dd>The character(s) to represent a space in the case that
<code>split.space=TRUE</code>. Should not just be a space(s).</dd>


<dt>split.punct</dt>
<dd>Logical; determines if splits should occur at punctuation.</dd>

</dl>

splitter: Character Splitter

Description

Usage

Value

Arguments

Details

Examples