text_intersect

a list, where each sublist is a tokenized text sequence (token_list1 should be of same length with token_list2)

token_list1

a list, where each sublist is a tokenized text sequence (token_list2 should be of same length with token_list1)

token_list2

either TRUE or FALSE. If TRUE then the intersection of distinct words (or letters) will be taken into account

distinct

either TRUE or FALSE. If TRUE then the intersection of letters in the text sequences will be computed

letters

intersection of words or letters in tokenized text

datasets

It offers functions for splitting, parsing, tokenizing and creating a vocabulary for big text data files. Moreover, it includes functions for building a document-term matrix and extracting information from those (term-associations, most frequent terms). It also embodies functions for calculating token statistics (collocations, look-up tables, string dissimilarities) and functions to work with sparse matrices. Lastly, it includes functions for Word Vector Representations (i.e. 'GloVe', 'fasttext') and incorporates functions for the calculation of (pairwise) text document dissimilarities. The source code is based on 'C++11' and exported in R through the 'Rcpp', 'RcppArmadillo' and 'BH' packages.

Lampros Mouselimis

textTinyR

Text Processing for Small or Big Data Files

text_intersect function

An object of class <code>R6ClassGenerator</code> of length 24.

Format

<dl class="dl-horizontal">
 <dt><code>text_intersect$new(file_data = NULL)</code></dt><dd></dd></dl><dt><code>--------------</code></dt><dd></dd><dt><code>count_intersect(distinct = FALSE, letters = FALSE)</code></dt><dd></dd><dt><code>--------------</code></dt><dd></dd><dt><code>ratio_intersect(distinct = FALSE, letters = FALSE)</code></dt><dd></dd>

text_intersect: intersection of words or letters in tokenized text

Description

Usage

Arguments

Value

Format

Methods

Details

References

Examples