docTermMatrix

docTermMatrix,data.frame-method

docTermMatrix,-methods

docTermMatrix,kRp.text-method

Either an object of class <code><a rd-options="koRpus:kRp.text-class" href="/link/kRp.text?package=koRpus&version=0.13-8&to=koRpus%3AkRp.text-class" data-mini-rdoc="koRpus:kRp.text-class::kRp.text">kRp.text</a></code>,
 or a TIF[1] compliant token data frame.

A character string defining the <code>tokens</code> column to be used for calculating the matrix.

terms

Logical, whether terms should be counted case sensitive.

case.sens

Logical,
 if <code>TRUE</code> calculates term frequency--inverse document frequency (tf-idf)
values instead of absolute frequency.

tfidf

Additional arguments depending on the particular method.

Returns a sparse document-term matrix calculated from a given TIF[1] compliant token data frame
or object of class <code>kRp.text</code>. You can also
calculate the term frequency inverted document frequency value (tf-idf) for each term.

A set of tools to analyze texts. Includes, amongst others, functions for
automatic language detection, hyphenation, several indices of lexical diversity
(e.g., type token ratio, HD-D/vocd-D, MTLD) and readability (e.g., Flesch,
SMOG, LIX, Dale-Chall). Basic import functions for language corpora are also
provided, to enable frequency analyses (supports Celex and Leipzig Corpora
Collection file formats) and measures like tf-idf. Note: For full functionality
a local installation of TreeTagger is recommended. It is also recommended to
not load this package directly, but by loading one of the available language
support packages from the 'l10n' repository
<https://undocumeantit.github.io/repos/l10n/>. 'koRpus' also includes a plugin
for the R GUI and IDE RKWard, providing graphical dialogs for its basic
features. The respective R package 'rkward' cannot be installed directly from a
repository, as it is a part of RKWard. To make full use of this feature, please
install RKWard from <https://rkward.kde.org> (plugins are detected
automatically). Due to some restrictions on CRAN, the full package sources are
only available from the project homepage. To ask for help, report bugs, request
features, or discuss the development of the package, please subscribe to the
koRpus-dev mailing list (<https://korpusml.reaktanz.de>).

Meik Michalke

koRpus

Text Analysis with Emphasis on POS Tagging, Readability, and
Lexical Diversity

Earl Brown

Alberto Mirisola

Alexandre Brulet

Laura Hauser

docTermMatrix function

<dl><dt>obj</dt>
<dd>Either an object of class <code>kRp.text</code>,
 or a TIF[1] compliant token data frame.</dd>
<dt>terms</dt>
<dd>A character string defining the <code>tokens</code> column to be used for calculating the matrix.</dd>
<dt>case.sens</dt>
<dd>Logical, whether terms should be counted case sensitive.</dd>
<dt>tfidf</dt>
<dd>Logical,
 if <code>TRUE</code> calculates term frequency--inverse document frequency (tf-idf)
values instead of absolute frequency.</dd>
<dt>...</dt>
<dd>Additional arguments depending on the particular method.</dd></dl>

Arguments

Generate a document-term matrix — docTermMatrix

<dl>

<dt>obj</dt>
<dd>Either an object of class <code>kRp.text</code>,
 or a TIF[1] compliant token data frame.</dd>


<dt>terms</dt>
<dd>A character string defining the <code>tokens</code> column to be used for calculating the matrix.</dd>


<dt>case.sens</dt>
<dd>Logical, whether terms should be counted case sensitive.</dd>


<dt>tfidf</dt>
<dd>Logical,
 if <code>TRUE</code> calculates term frequency--inverse document frequency (tf-idf)
values instead of absolute frequency.</dd>


<dt>...</dt>
<dd>Additional arguments depending on the particular method.</dd>

</dl>

docTermMatrix: Generate a document-term matrix

Description

Usage

Value

Arguments

Details

References

Examples