Learn R Programming

textTinyR (version 1.1.8)

JACCARD_DICE: Jaccard or Dice similarity for text documents

Description

Jaccard or Dice similarity for text documents

Usage

JACCARD_DICE(
  token_list1 = NULL,
  token_list2 = NULL,
  method = "jaccard",
  threads = 1
)

Value

a numeric vector

Arguments

token_list1

a list of tokenized text documents (it should have the same length as the token_list2)

token_list2

a list of tokenized text documents (it should have the same length as the token_list1)

method

a character string specifying the similarity metric. One of 'jaccard', 'dice'

threads

a numeric value specifying the number of cores to run in parallel

Details

The function calculates either the jaccard or the dice distance between pairs of tokenized text of two lists

Examples

Run this code

library(textTinyR)

lst1 = list(c('use', 'this', 'function', 'to'), c('either', 'compute', 'the', 'jaccard'))

lst2 = list(c('or', 'the', 'dice', 'distance'), c('for', 'two', 'same', 'sized', 'lists'))

out = JACCARD_DICE(token_list1 = lst1, token_list2 = lst2, method = 'jaccard', threads = 1)

Run the code above in your browser using DataLab