ngram_tokenize

boolean value specifying whether to use character (char = TRUE) 
or word n-grams (char = FALSE, default)

char

integer giving the minimum order of n-gram (default: 1)

ngmin

integer giving the maximum order of n-gram (default: 3)

ngmax

A tokenizer for use with a document-term matrix from the tm package. Supports 
both character and word ngrams, including own wrapper to handle non-Latin 
encodings

preprocessing

Performs a sentiment analysis of textual contents in R. This implementation
utilizes various existing dictionaries, such as Harvard IV, or finance-specific
dictionaries. Furthermore, it can also create customized dictionaries. The latter
uses LASSO regularization as a statistical approach to select relevant terms based on
an exogenous response variable.

Nicolas Proellochs

SentimentAnalysis

Dictionary-Based Sentiment Analysis

Stefan Feuerriegel

ngram_tokenize function

<dl><dt>x</dt>
<dd>input string</dd>
<dt>char</dt>
<dd>boolean value specifying whether to use character (char = TRUE) 
or word n-grams (char = FALSE, default)</dd>
<dt>ngmin</dt>
<dd>integer giving the minimum order of n-gram (default: 1)</dd>
<dt>ngmax</dt>
<dd>integer giving the maximum order of n-gram (default: 3)</dd></dl>

Arguments

N-gram tokenizer — ngram_tokenize

<dl>

<dt>x</dt>
<dd>input string</dd>


<dt>char</dt>
<dd>boolean value specifying whether to use character (char = TRUE) 
or word n-grams (char = FALSE, default)</dd>


<dt>ngmin</dt>
<dd>integer giving the minimum order of n-gram (default: 1)</dd>


<dt>ngmax</dt>
<dd>integer giving the maximum order of n-gram (default: 3)</dd>

</dl>

ngram_tokenize: N-gram tokenizer

Description

Usage

Arguments

Examples