The match_fun
argument is called once on a vector with all pairs
of unique comparisons: thus, it should be efficient and vectorized.
fuzzy_join(
x,
y,
by = NULL,
match_fun = NULL,
multi_by = NULL,
multi_match_fun = NULL,
index_match_fun = NULL,
mode = "inner",
...
)fuzzy_inner_join(x, y, by = NULL, match_fun, ...)
fuzzy_left_join(x, y, by = NULL, match_fun, ...)
fuzzy_right_join(x, y, by = NULL, match_fun, ...)
fuzzy_full_join(x, y, by = NULL, match_fun, ...)
fuzzy_semi_join(x, y, by = NULL, match_fun, ...)
fuzzy_anti_join(x, y, by = NULL, match_fun, ...)
A tbl
A tbl
Columns of each to join
Vectorized function given two columns, returning
TRUE or FALSE as to whether they are a match. Can be a list of functions
one for each pair of columns specified in by
(if a named list, it
uses the names in x).
If only one function is given it is used on all column pairs.
Columns to join, where all columns will be used to test matches together
Function to use for testing matches, performed on all columns in each data frame simultaneously
Function to use for matching tables. Unlike
match_fun
and index_match_fun
, this is performed on the
original columns and returns pairs of indices.
One of "inner", "left", "right", "full" "semi", or "anti"
Extra arguments passed to match_fun
match_fun should return either a logical vector, or a data frame where the first column is logical. If the latter, the additional columns will be appended to the output. For example, these additional columns could contain the distance metrics that one is filtering on.
Note that as of now, you cannot give both match_fun
and multi_match_fun
- you can either compare each column
individually or compare all of them.
Like in dplyr's join operations, fuzzy_join
ignores groups,
but preserves the grouping of x in the output.