Learn R Programming

fuzzywuzzyR (version 1.0.5)

FuzzUtils: Utility functions

Description

Utility functions

Utility functions

Usage

# init <- FuzzUtils$new()

Arguments

Methods

FuzzUtils$new()

--------------

Full_process(string = NULL, force_ascii = TRUE, decoding = NULL)

--------------

INTR(n = 2.0)

--------------

Make_type_consistent(string1 = NULL, string2 = NULL)

--------------

Asciidammit(input = NULL)

--------------

Asciionly(string = NULL)

--------------

Validate_string(string = NULL)

Methods

Public methods

Method new()

Usage

FuzzUtils$new()

Method Full_process()

Usage

FuzzUtils$Full_process(string = NULL, force_ascii = TRUE, decoding = NULL)

Arguments

string

a character string.

force_ascii

allow only ASCII characters (force convert to ascii)

decoding

either NULL or a character string. If not NULL then the decoding parameter takes one of the standard python encodings (such as 'utf-8'). See the details and references link for more information (in this class it applies only to the Full_process function)

Method INTR()

Usage

FuzzUtils$INTR(n = 2)

Arguments

n

a float number

Method Make_type_consistent()

Usage

FuzzUtils$Make_type_consistent(string1 = NULL, string2 = NULL)

Arguments

string1

a character string.

string2

a character string.

Method Asciidammit()

Usage

FuzzUtils$Asciidammit(input = NULL)

Arguments

input

any kind of data type (applies to the Asciidammit method)

Method Asciionly()

Usage

FuzzUtils$Asciionly(string = NULL)

Arguments

string

a character string.

Method Validate_string()

Usage

FuzzUtils$Validate_string(string = NULL)

Arguments

string

a character string.

Method clone()

The objects of this class are cloneable with this method.

Usage

FuzzUtils$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Details

the decoding parameter is useful in case of non-ascii character strings. If this parameter is not NULL then the force_ascii parameter (if applicable) is internally set to FALSE. Decoding applies only to python 2 configurations, as in python 3 character strings are decoded to unicode by default.

the Full_process processes a string by : 1. removing all but letters and numbers, 2. trim whitespace, 3. force to lower case and 4. if force_ascii == TRUE, force convert to ascii

the INTR method returns a correctly rounded integer

the Make_type_consistent method converts both objects if they aren't either both string or unicode instances to unicode

the Asciidammit performs ascii dammit using the following expression bad_chars = str("").join([chr(i) for i in range(128, 256)]). Applies to any kind of R data type.

the Asciionly method returns the same result as the Asciidammit method but for character strings using the python .translate() function.

the Validate_string method checks that the input has length and that length is greater than 0

Some of the utils functions are used as secondary methods in the FuzzExtract class. See the examples of the FuzzExtract class for more details.

References

https://github.com/seatgeek/fuzzywuzzy/blob/master/fuzzywuzzy/utils.py, https://docs.python.org/3/library/codecs.html#standard-encodings

Examples

Run this code
# NOT RUN {
try({
  if (reticulate::py_available(initialize = FALSE)) {

    if (check_availability()) {

      library(fuzzywuzzyR)

      s1 = 'Frodo Baggins'

      s2 = 'Bilbo Baggin'

      init = FuzzUtils$new()

      init$Full_process(string = s1, force_ascii = TRUE)

      init$INTR(n = 2.0)

      init$Make_type_consistent(string1 = s1, string2 = s2)

      #------------------------------------
      # 'Asciidammit' with character string
      #------------------------------------

      init$Asciidammit(input = s1)

      #----------------------------------------------------------------
      # 'Asciidammit' with data.frame(123) [ or any kind of data type ]
      #----------------------------------------------------------------

      init$Asciidammit(input = data.frame(123))

      init$Asciionly(string = s1)

      init$Validate_string(string = s2)
    }
  }
}, silent=TRUE)
# }

Run the code above in your browser using DataLab