ft_lsh_utils

ml_approx_nearest_neighbors

ml_approx_similarity_join

R interface to Apache Spark, a fast and general
engine for big data processing, see <https://spark.apache.org/>. This
package supports connecting to local and remote Apache Spark clusters,
provides a 'dplyr' compatible back-end, and provides an interface to
Spark's built-in machine learning algorithms.

Edgar Ruiz

sparklyr

R Interface to Apache Spark

Javier Luraschi

Kevin Kuo

Kevin Ushey

JJ Allaire

Samuel Macedo

Hossein Falaki

Lu Wang

Andy Zhang

Yitao Li

Jozef Hajnala

Maciej Szymkiewicz

Wil Davis

 RStudio

 The Apache Software Foundation

ft_lsh_utils function

<dl><dt>model</dt>
<dd>A fitted LSH model, returned by either <code>ft_minhash_lsh()</code>
or <code>ft_bucketed_random_projection_lsh()</code>.</dd>
<dt>dataset</dt>
<dd>The dataset to search for nearest neighbors of the key.</dd>
<dt>key</dt>
<dd>Feature vector representing the item to search for.</dd>
<dt>num_nearest_neighbors</dt>
<dd>The maximum number of nearest neighbors.</dd>
<dt>dist_col</dt>
<dd>Output column for storing the distance between each result row and the key.</dd>
<dt>dataset_a</dt>
<dd>One of the datasets to join.</dd>
<dt>dataset_b</dt>
<dd>Another dataset to join.</dd>
<dt>threshold</dt>
<dd>The threshold for the distance of row pairs.</dd></dl>

Arguments

Utility functions for LSH models — ft_lsh_utils

<dl>

<dt>model</dt>
<dd>A fitted LSH model, returned by either <code>ft_minhash_lsh()</code>
or <code>ft_bucketed_random_projection_lsh()</code>.</dd>


<dt>dataset</dt>
<dd>The dataset to search for nearest neighbors of the key.</dd>


<dt>key</dt>
<dd>Feature vector representing the item to search for.</dd>


<dt>num_nearest_neighbors</dt>
<dd>The maximum number of nearest neighbors.</dd>


<dt>dist_col</dt>
<dd>Output column for storing the distance between each result row and the key.</dd>


<dt>dataset_a</dt>
<dd>One of the datasets to join.</dd>


<dt>dataset_b</dt>
<dd>Another dataset to join.</dd>


<dt>threshold</dt>
<dd>The threshold for the distance of row pairs.</dd>

</dl>

Utility functions for LSH models

ft_lsh_utils: Utility functions for LSH models

Description

Usage

Arguments