Learn R Programming

fuzzywuzzyR (version 1.0.5)

SequenceMatcher: Character string sequence matching

Description

Character string sequence matching

Character string sequence matching

Usage

# init <- SequenceMatcher$new(string1 = NULL, string2 = NULL)

Arguments

Methods

SequenceMatcher$new(string1 = NULL, string2 = NULL)

--------------

ratio()

--------------

quick_ratio()

--------------

real_quick_ratio()

--------------

get_matching_blocks()

--------------

get_opcodes()

Methods

Public methods

Method new()

Usage

SequenceMatcher$new(string1 = NULL, string2 = NULL)

Arguments

string1

a character string.

string2

a character string.

Method ratio()

Usage

SequenceMatcher$ratio()

Method quick_ratio()

Usage

SequenceMatcher$quick_ratio()

Method real_quick_ratio()

Usage

SequenceMatcher$real_quick_ratio()

Method get_matching_blocks()

Usage

SequenceMatcher$get_matching_blocks()

Method get_opcodes()

Usage

SequenceMatcher$get_opcodes()

Method clone()

The objects of this class are cloneable with this method.

Usage

SequenceMatcher$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Details

the ratio method returns a measure of the sequences' similarity as a float in the range [0, 1]. Where T is the total number of elements in both sequences, and M is the number of matches, this is 2.0*M / T. Note that this is 1.0 if the sequences are identical, and 0.0 if they have nothing in common. This is expensive to compute if getMatchingBlocks() or getOpcodes() hasn<U+2019>t already been called, in which case you may want to try quickRatio() or realQuickRatio() first to get an upper bound.

the quick_ratio method returns an upper bound on ratio() relatively quickly.

the real_quick_ratio method returns an upper bound on ratio() very quickly.

the get_matching_blocks method returns a list of triples describing matching subsequences. Each triple is of the form [i, j, n], and means that a[i:i+n] == b[j:j+n]. The triples are monotonically increasing in i and j. The last triple is a dummy, and has the value [a.length, b.length, 0]. It is the only triple with n == 0. If [i, j, n] and [i', j', n'] are adjacent triples in the list, and the second is not the last triple in the list, then i+n != i' or j+n != j'; in other words, adjacent triples always describe non-adjacent equal blocks.

The get_opcodes method returns a list of 5-tuples describing how to turn a into b. Each tuple is of the form [tag, i1, i2, j1, j2]. The first tuple has i1 == j1 == 0, and remaining tuples have i1 equal to the i2 from the preceding tuple, and, likewise, j1 equal to the previous j2. The tag values are strings, with these meanings: 'replace' a[i1:i2] should be replaced by b[j1:j2]. 'delete' a[i1:i2] should be deleted. Note that j1 == j2 in this case. 'insert' b[j1:j2] should be inserted at a[i1:i1]. Note that i1 == i2 in this case. 'equal' a[i1:i2] == b[j1:j2] (the sub-sequences are equal).

References

https://www.npmjs.com/package/difflib, http://stackoverflow.com/questions/10383044/fuzzy-string-comparison

Examples

Run this code
# NOT RUN {
try({
  if (reticulate::py_available(initialize = FALSE)) {

    if (check_availability()) {

      library(fuzzywuzzyR)

      s1 = ' It was a dark and stormy night. I was all alone sitting on a red chair.'

      s2 = ' It was a murky and stormy night. I was all alone sitting on a crimson chair.'

      init = SequenceMatcher$new(string1 = s1, string2 = s2)

      init$ratio()

      init$quick_ratio()

      init$real_quick_ratio()

      init$get_matching_blocks()

      init$get_opcodes()

    }
  }
}, silent=TRUE)
# }

Run the code above in your browser using DataLab