dtm_chisq

Perform a <code><a href="/link/chisq.test?package=udpipe&version=0.8.11" data-mini-rdoc="udpipe::chisq.test">chisq.test</a></code> to compare if groups of documents have more prevalence of specific terms. 
The function looks to each term in the document term matrix and applies a <code><a href="/link/chisq.test?package=udpipe&version=0.8.11" data-mini-rdoc="udpipe::chisq.test">chisq.test</a></code> comparing the frequency 
of occurrence of each term compared to the other terms in the document group.

This natural language processing toolkit provides language-agnostic
'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency
parsing' of raw text. Next to text parsing, the package also allows you to train
annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided
at <https://universaldependencies.org/format.html>. The techniques are explained
in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0
with UDPipe', available at <doi:10.18653/v1/K17-3009>.
The toolkit also contains functionalities for commonly used data manipulations on texts
which are enriched with the output of the parser. Namely functionalities and algorithms
for collocations, token co-occurrence, document term matrix handling,
term frequency inverse document frequency calculations,
information retrieval metrics (Okapi BM25), handling of multi-word expressions,
keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns)
sentiment scoring and semantic similarity analysis.

Jan Wijffels

udpipe

Tokenization, Parts of Speech Tagging, Lemmatization and
Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

BNOSAC 

Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague, Czech Republic 

Milan Straka 

Jana Straková 

dtm_chisq function

<dl><dt>dtm</dt>
<dd>a document term matrix: an object returned by <code>document_term_matrix</code></dd>
<dt>groups</dt>
<dd>a logical vector with 2 groups (TRUE / FALSE) where the size of the <code>groups</code> vector 
is the same as the number of rows of <code>dtm</code> and where element i corresponds row i of <code>dtm</code></dd>
<dt>correct</dt>
<dd>passed on to <code><a href="/link/chisq.test?package=udpipe&version=0.8.11" data-mini-rdoc="udpipe::chisq.test">chisq.test</a></code></dd>
<dt>...</dt>
<dd>further arguments passed on to <code><a href="/link/chisq.test?package=udpipe&version=0.8.11" data-mini-rdoc="udpipe::chisq.test">chisq.test</a></code></dd></dl>

Arguments

Perform a <code><a href='https://rdrr.io/r/stats/chisq.test.html'>chisq.test</a></code> to compare if groups of documents have more prevalence of specific terms. 
The function looks to each term in the document term matrix and applies a <code><a href='https://rdrr.io/r/stats/chisq.test.html'>chisq.test</a></code> comparing the frequency 
of occurrence of each term compared to the other terms in the document group.

Compare term usage across 2 document groups using the Chi-square Test for Count Data — dtm_chisq

<dl>

<dt>dtm</dt>
<dd>a document term matrix: an object returned by <code>document_term_matrix</code></dd>


<dt>groups</dt>
<dd>a logical vector with 2 groups (TRUE / FALSE) where the size of the <code>groups</code> vector 
is the same as the number of rows of <code>dtm</code> and where element i corresponds row i of <code>dtm</code></dd>


<dt>correct</dt>
<dd>passed on to <code><a href='https://rdrr.io/r/stats/chisq.test.html'>chisq.test</a></code></dd>


<dt>...</dt>
<dd>further arguments passed on to <code><a href='https://rdrr.io/r/stats/chisq.test.html'>chisq.test</a></code></dd>

</dl>

dtm_chisq: Compare term usage across 2 document groups using the Chi-square Test for Count Data

Description

Usage

Value

Arguments

Examples