getvocab

The corpus of documents (a vector of characters).

corpus

Minimum word count to be considered as frequent.

mincount

Minimum collocation of words count to be considered as frequent.

minphrasecount

ngram

The language of the documents (NULL if no stemming).

lang

Stopwords, or the language of the documents. NULL if stop words should not be removed.

stopwords

Extract words and phrases from a corpus of documents.

Contains functions to simplify the use of data mining methods (classification, regression, clustering, etc.), for students and beginners in R programming. Various R packages are used and wrappers are built around the main functions, to standardize the use of data mining methods (input/output): it brings a certain loss of flexibility, but also a gain of simplicity. The package name came from the French "Fouille de Données en Master 2 Informatique Décisionnelle".

Alexandre Blansch<c3><a9>

fdm2id

Data Mining and R Programming for Beginners

Alexandre Blansché

getvocab function

<dl><dt>corpus</dt>
<dd>The corpus of documents (a vector of characters).</dd>
<dt>mincount</dt>
<dd>Minimum word count to be considered as frequent.</dd>
<dt>minphrasecount</dt>
<dd>Minimum collocation of words count to be considered as frequent.</dd>
<dt>ngram</dt>
<dd>maximum size of n-grams.</dd>
<dt>lang</dt>
<dd>The language of the documents (NULL if no stemming).</dd>
<dt>stopwords</dt>
<dd>Stopwords, or the language of the documents. NULL if stop words should not be removed.</dd>
<dt>...</dt>
<dd>Other parameters.</dd></dl>

Arguments

Extract words and phrases from a corpus — getvocab

<dl>

<dt>corpus</dt>
<dd>The corpus of documents (a vector of characters).</dd>


<dt>mincount</dt>
<dd>Minimum word count to be considered as frequent.</dd>


<dt>minphrasecount</dt>
<dd>Minimum collocation of words count to be considered as frequent.</dd>


<dt>ngram</dt>
<dd>maximum size of n-grams.</dd>


<dt>lang</dt>
<dd>The language of the documents (NULL if no stemming).</dd>


<dt>stopwords</dt>
<dd>Stopwords, or the language of the documents. NULL if stop words should not be removed.</dd>


<dt>...</dt>
<dd>Other parameters.</dd>

</dl>

getvocab: Extract words and phrases from a corpus

Description

Usage

Value

Arguments

See Also

Examples