Learn R Programming

textTinyR (version 1.1.8)

text_intersect: intersection of words or letters in tokenized text

Description

intersection of words or letters in tokenized text

intersection of words or letters in tokenized text

Usage

# utl <- text_intersect$new(token_list1 = NULL, token_list2 = NULL)

Arguments

Value

a numeric vector

Methods

text_intersect$new(file_data = NULL)

--------------

count_intersect(distinct = FALSE, letters = FALSE)

--------------

ratio_intersect(distinct = FALSE, letters = FALSE)

Methods


Method new()

Usage

text_intersect$new(token_list1 = NULL, token_list2 = NULL)

Arguments

token_list1

a list, where each sublist is a tokenized text sequence (token_list1 should be of same length with token_list2)

token_list2

a list, where each sublist is a tokenized text sequence (token_list2 should be of same length with token_list1)


Method count_intersect()

Usage

text_intersect$count_intersect(distinct = FALSE, letters = FALSE)

Arguments

distinct

either TRUE or FALSE. If TRUE then the intersection of distinct words (or letters) will be taken into account

letters

either TRUE or FALSE. If TRUE then the intersection of letters in the text sequences will be computed


Method ratio_intersect()

Usage

text_intersect$ratio_intersect(distinct = FALSE, letters = FALSE)

Arguments

distinct

either TRUE or FALSE. If TRUE then the intersection of distinct words (or letters) will be taken into account

letters

either TRUE or FALSE. If TRUE then the intersection of letters in the text sequences will be computed


Method clone()

The objects of this class are cloneable with this method.

Usage

text_intersect$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Details

This class includes methods for text or character intersection. If both distinct and letters are FALSE then the simple (count or ratio) word intersection will be computed.

References

https://www.kaggle.com/c/home-depot-product-search-relevance/discussion/20427 by Igor Buinyi

Examples

Run this code

library(textTinyR)

tok1 = list(c('compare', 'this', 'text'),

            c('and', 'this', 'text'))

tok2 = list(c('with', 'another', 'set'),

            c('of', 'text', 'documents'))


init = text_intersect$new(tok1, tok2)


init$count_intersect(distinct = TRUE, letters = FALSE)


init$ratio_intersect(distinct = FALSE, letters = TRUE)

Run the code above in your browser using DataLab