Calculate termwise Deice coefficeints for pairs of lines and determine which lines are in both documents
dice_coefficient_line_matching(document_1, document_2, threshold = 0.8,
return_dice_matrix = TRUE, compare_consecutive_line_pairs = TRUE,
whole_document = FALSE)
A vector of strings (one per line or one per sentence), or a list of vectors of tokens (one per line or one per sentence).
Same as document_1, will be used for comparison.
A value between 0 and 1 that denotes the Dice coefficient threshold for considdering a line as being in both documents.
Logical indicating whether the Dice coefficient matrix should be returned. Defualts to TRUE.
Logical indicating whether consecutive pairs of lines should be compared. Defaults to TRUE.
Logical, defaults to FALSE. If TRUE, then all lines are combined and full documents are compared.
A list with two vectors, each giving the inidices of the lines in document 1/2 that are in the other document based on the dice coefficient threshold.