Learn R Programming

fuzzywuzzyR

The fuzzywuzzyR package is a fuzzy string matching implementation of the fuzzywuzzy python package. It uses the Levenshtein Distance to calculate the differences between sequences. More details on the functionality of fuzzywuzzyR can be found in the blog-post and in the package Vignette.

UPDATE 26-07-2018: A Singularity image file is available in case that someone intends to run fuzzywuzzyR on Ubuntu Linux (locally or in a cloud instance) with all package requirements pre-installed. This allows the user to utilize the fuzzywuzzyR package without having to spend time on the installation process.

System Requirements

  • Python (>= 2.4)

  • difflib

  • fuzzywuzzy ( >=0.15.0 )

  • python-Levenshtein ( >=0.12.0, optional, provides a 4-10x speedup in String Matching, though may result in differing results for certain cases)

Before the installation of any python modules one should check the python-configuration using :

reticulate::py_config()

All modules should be installed in the default python configuration (the configuration that the R-session displays as default), otherwise errors will occur during package installation.

Debian/Ubuntu/Fedora

Python2

sudo apt-get install python-pip
sudo pip install --upgrade pip
pip install fuzzywuzzy
pip install python-Levenshtein

Python 3

sudo apt-get install python3-pip
sudo pip3 install --upgrade pip
pip3 install fuzzywuzzy
pip3 install python-Levenshtein

Macintosh OSX

sudo easy_install pip
sudo pip install fuzzywuzzy
sudo pip install python-Levenshtein

Windows OS

  • Download of get-pip.py
  • Update of the Environment variables ( Control Panel >> System and Security >> System >> Advanced system settings >> Environment variables >> System variables >> Path >> Edit ) by adding ( for instance in case of python 2.7 ) :
C:\Python27;C:\Python27\Scripts
pip install fuzzywuzzy
pip install python-Levenshtein

Installation of the fuzzywuzzyR package

To install the package from CRAN use,


install.packages('fuzzywuzzyR')

and to download the latest version from Github use the install_github function of the devtools package,


devtools::install_github(repo = 'mlampros/fuzzywuzzyR')

https://github.com/mlampros/fuzzywuzzyR/issues

Citation:

If you use the code of this repository in your paper or research please cite both fuzzywuzzyR and the original software https://CRAN.R-project.org/package=fuzzywuzzyR/citation.html:

@Manual{,
  title = {{fuzzywuzzyR}: Fuzzy String Matching in R},
  author = {Lampros Mouselimis},
  year = {2021},
  note = {R package version 1.0.5},
  url = {https://CRAN.R-project.org/package=fuzzywuzzyR},
}

Copy Link

Version

Install

install.packages('fuzzywuzzyR')

Monthly Downloads

591

Version

1.0.5

License

GPL-2

Issues

Pull Requests

Stars

Forks

Last Published

September 11th, 2021

Functions in fuzzywuzzyR (1.0.5)

FuzzMatcher

Fuzzy character string matching ( ratios )
GetCloseMatches

Matches of character strings
check_availability

This function checks if all relevant python modules are available
FuzzExtract

Fuzzy extraction from a sequence
check_scorer

secondary function for the 'FuzzExtract' class
SequenceMatcher

Character string sequence matching
FuzzUtils

Utility functions
is_python2

This function returns TRUE if python2 is installed and used in the OS