dtm_align

This utility function is useful to align a Document-Term-Matrix with 
information in a data.frame or a vector to predict, such that both the predictive information as well as the target 
is available in the same order. 
Matching is done based on the identifiers in the rownames of <code>x</code> and either the names of the <code>y</code> vector 
or the first column of <code>y</code> in case it is a data.frame.

This natural language processing toolkit provides language-agnostic
'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency
parsing' of raw text. Next to text parsing, the package also allows you to train
annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided
at <https://universaldependencies.org/format.html>. The techniques are explained
in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0
with UDPipe', available at <doi:10.18653/v1/K17-3009>.
The toolkit also contains functionalities for commonly used data manipulations on texts
which are enriched with the output of the parser. Namely functionalities and algorithms
for collocations, token co-occurrence, document term matrix handling,
term frequency inverse document frequency calculations,
information retrieval metrics (Okapi BM25), handling of multi-word expressions,
keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns)
sentiment scoring and semantic similarity analysis.

Jan Wijffels

udpipe

Tokenization, Parts of Speech Tagging, Lemmatization and
Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

BNOSAC 

Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague, Czech Republic 

Milan Straka 

Jana Straková 

dtm_align function

<dl><dt>x</dt>
<dd>a Document-Term-Matrix of class dgCMatrix (which can be an object returned by <code>document_term_matrix</code>)</dd>
<dt>y</dt>
<dd>either a vector or data.frame containing something to align with <code>x</code> (e.g. for predictive purposes).<ul>
<li>In case <code>y</code> is a vector, it should have names which are available in the rownames of <code>x</code>.</li>
<li>In case <code>y</code> is a data.frame, it's first column should contain identifiers which are available in the rownames of <code>x</code>.</li>
</ul></dd>
<dt>FUN</dt>
<dd>a function to be applied on <code>x</code> before aligning it to <code>y</code>. See the examples</dd>
<dt>...</dt>
<dd>further arguments passed on to FUN</dd></dl>

Arguments

This utility function is useful to align a Document-Term-Matrix with 
information in a data.frame or a vector to predict, such that both the predictive information as well as the target 
is available in the same order. 
Matching is done based on the identifiers in the rownames of <code>x</code> and either the names of the <code>y</code> vector 
or the first column of <code>y</code> in case it is a data.frame.

Reorder a Document-Term-Matrix alongside a vector or data.frame — dtm_align

<dl>

<dt>x</dt>
<dd>a Document-Term-Matrix of class dgCMatrix (which can be an object returned by <code>document_term_matrix</code>)</dd>


<dt>y</dt>
<dd>either a vector or data.frame containing something to align with <code>x</code> (e.g. for predictive purposes).<ul>
<li>In case <code>y</code> is a vector, it should have names which are available in the rownames of <code>x</code>.</li>
<li>In case <code>y</code> is a data.frame, it's first column should contain identifiers which are available in the rownames of <code>x</code>.</li>
</ul></dd>


<dt>FUN</dt>
<dd>a function to be applied on <code>x</code> before aligning it to <code>y</code>. See the examples</dd>


<dt>...</dt>
<dd>further arguments passed on to FUN</dd>

</dl>

dtm_align: Reorder a Document-Term-Matrix alongside a vector or data.frame

Description

Usage

Value

Arguments

See Also

Examples