getvocab

Extract words and phrases from a corpus of documents.

Contains functions to simplify the use of data mining methods (classification, regression, clustering, etc.), for students and beginners in R programming. Various R packages are used and wrappers are built around the main functions, to standardize the use of data mining methods (input/output): it brings a certain loss of flexibility, but also a gain of simplicity. The package name came from the French "Fouille de Données en Master 2 Informatique Décisionnelle".

Alexandre Blansch<c3><a9>

fdm2id

Data Mining and R Programming for Beginners

Alexandre Blansché

getvocab function

<dl><dt>corpus</dt>
<dd>The corpus of documents (a vector of characters).</dd>
<dt>mincount</dt>
<dd>Minimum word count to be considered as frequent.</dd>
<dt>minphrasecount</dt>
<dd>Minimum collocation of words count to be considered as frequent.</dd>
<dt>ngram</dt>
<dd>maximum size of n-grams.</dd>
<dt>lang</dt>
<dd>The language of the documents (NULL if no stemming).</dd>
<dt>stopwords</dt>
<dd>Stopwords, or the language of the documents. NULL if stop words should not be removed.</dd>
<dt>...</dt>
<dd>Other parameters.</dd></dl>

Arguments

Extract words and phrases from a corpus — getvocab

<dl>

<dt>corpus</dt>
<dd>The corpus of documents (a vector of characters).</dd>


<dt>mincount</dt>
<dd>Minimum word count to be considered as frequent.</dd>


<dt>minphrasecount</dt>
<dd>Minimum collocation of words count to be considered as frequent.</dd>


<dt>ngram</dt>
<dd>maximum size of n-grams.</dd>


<dt>lang</dt>
<dd>The language of the documents (NULL if no stemming).</dd>


<dt>stopwords</dt>
<dd>Stopwords, or the language of the documents. NULL if stop words should not be removed.</dd>


<dt>...</dt>
<dd>Other parameters.</dd>

</dl>

getvocab: Extract words and phrases from a corpus

Description

Usage

Value

Arguments

See Also

Examples