Learn R Programming

SpeedReader (version 0.9.1)

dice_coefficient_line_matching: Lines In Both Documents via Dice Coefficients

Description

Calculate termwise Deice coefficeints for pairs of lines and determine which lines are in both documents

Usage

dice_coefficient_line_matching(document_1, document_2, threshold = 0.8,
  return_dice_matrix = TRUE, compare_consecutive_line_pairs = TRUE,
  whole_document = FALSE)

Arguments

document_1

A vector of strings (one per line or one per sentence), or a list of vectors of tokens (one per line or one per sentence).

document_2

Same as document_1, will be used for comparison.

threshold

A value between 0 and 1 that denotes the Dice coefficient threshold for considdering a line as being in both documents.

return_dice_matrix

Logical indicating whether the Dice coefficient matrix should be returned. Defualts to TRUE.

compare_consecutive_line_pairs

Logical indicating whether consecutive pairs of lines should be compared. Defaults to TRUE.

whole_document

Logical, defaults to FALSE. If TRUE, then all lines are combined and full documents are compared.

Value

A list with two vectors, each giving the inidices of the lines in document 1/2 that are in the other document based on the dice coefficient threshold.