Learn R Programming

comparator (version 0.1.3)

Lookup: Lookup String Comparator

Description

Compares a pair of strings \(x\) and \(y\) by retrieving their distance/similarity score from a provided lookup table.

Usage

Lookup(
  lookup_table,
  values_colnames,
  score_colname,
  default_match = 0,
  default_nonmatch = NA_real_,
  symmetric = TRUE,
  ignore_case = FALSE
)

Value

A Lookup instance is returned, which is an S4 class inheriting from StringComparator.

Arguments

lookup_table

data frame containing distances/similarities for pairs of values

values_colnames

character vector containing the colnames corresponding to pairs of values (e.g. strings) in lookup_table

score_colname

name of column that contains distances/similarities in lookup_table

default_match

distance/similarity to use if the pair of values match exactly and do not appear in lookup_table. Defaults to 0.0.

default_nonmatch

distance/similarity to use if the pair of values are not an exact match and do not appear in lookup table. Defaults to NA.

symmetric

whether the underlying distance/similarity scores are symmetric. If TRUE lookup_table need only contain entries for one of the two pairs---i.e. an entry for value pair \((y, x)\) is not required if an entry for \((x, y)\) is already present.

ignore_case

a logical. If TRUE, case is ignored when comparing the strings.

Details

The lookup table should contain three columns corresponding to \(x\), and \(y\) (values_colnames below) and the distance/similarity (score_colname below). If a pair of values \(x\) and \(y\) is not in the lookup table, a default distance/similarity is returned depending on whether \(x = y\) (default_match below) or \(x \neq y\) (default_nonmatch below).

Examples

Run this code
## Measure the distance between cities
lookup_table <- data.frame(x = c("Melbourne", "Melbourne", "Sydney"), 
                           y = c("Sydney", "Brisbane", "Brisbane"), 
                           dist = c(713.4, 1374.8, 732.5))

comparator <- Lookup(lookup_table, c("x", "y"), "dist")
comparator("Sydney", "Melbourne")
comparator("Melbourne", "Perth")

Run the code above in your browser using DataLab