Utility functions for LSH models
ml_approx_nearest_neighbors(
model,
dataset,
key,
num_nearest_neighbors,
dist_col = "distCol"
)ml_approx_similarity_join(
model,
dataset_a,
dataset_b,
threshold,
dist_col = "distCol"
)
A fitted LSH model, returned by either ft_minhash_lsh()
or ft_bucketed_random_projection_lsh()
.
The dataset to search for nearest neighbors of the key.
Feature vector representing the item to search for.
The maximum number of nearest neighbors.
Output column for storing the distance between each result row and the key.
One of the datasets to join.
Another dataset to join.
The threshold for the distance of row pairs.