txt_context

If you have annotated your text using <code>udpipe_annotate</code>,
your text is tokenised in a sequence of words. Based on this vector of words in sequence
getting n-grams comes down to looking at the previous/next word and the subsequent previous/next word andsoforth.
These words can be <code>pasted</code> together to form an n-gram.

This natural language processing toolkit provides language-agnostic
'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency
parsing' of raw text. Next to text parsing, the package also allows you to train
annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided
at <https://universaldependencies.org/format.html>. The techniques are explained
in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0
with UDPipe', available at <doi:10.18653/v1/K17-3009>.
The toolkit also contains functionalities for commonly used data manipulations on texts
which are enriched with the output of the parser. Namely functionalities and algorithms
for collocations, token co-occurrence, document term matrix handling,
term frequency inverse document frequency calculations,
information retrieval metrics (Okapi BM25), handling of multi-word expressions,
keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns)
sentiment scoring and semantic similarity analysis.

Jan Wijffels

udpipe

Tokenization, Parts of Speech Tagging, Lemmatization and
Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

BNOSAC 

Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague, Czech Republic 

Milan Straka 

Jana Straková 

txt_context function

<dl><dt>x</dt>
<dd>a character vector where each element is just 1 term or word</dd>
<dt>n</dt>
<dd>an integer vector indicating how many terms to look back and ahead</dd>
<dt>sep</dt>
<dd>a character element indicating how to <code><a href="/link/paste?package=udpipe&version=0.8.11" data-mini-rdoc="udpipe::paste">paste</a></code> the subsequent words together</dd>
<dt>na.rm</dt>
<dd>logical, if set to <code>TRUE</code>, will keep all text even if it can not look back/ahead the amount specified by <code>n</code>. 
If set to <code>FALSE</code>, will have a resulting value of <code>NA</code>
if at least one element is <code>NA</code> or it can not look back/ahead the amount specified by <code>n</code>.</dd></dl>

Arguments

Based on a vector with a word sequence, get n-grams (looking forward + backward) — txt_context

<dl>

<dt>x</dt>
<dd>a character vector where each element is just 1 term or word</dd>


<dt>n</dt>
<dd>an integer vector indicating how many terms to look back and ahead</dd>


<dt>sep</dt>
<dd>a character element indicating how to <code><a href='https://rdrr.io/r/base/paste.html'>paste</a></code> the subsequent words together</dd>


<dt>na.rm</dt>
<dd>logical, if set to <code>TRUE</code>, will keep all text even if it can not look back/ahead the amount specified by <code>n</code>. 
If set to <code>FALSE</code>, will have a resulting value of <code>NA</code>
if at least one element is <code>NA</code> or it can not look back/ahead the amount specified by <code>n</code>.</dd>

</dl>

txt_context: Based on a vector with a word sequence, get n-grams (looking forward + backward)

Description

Usage

Value

Arguments

See Also

Examples