sdf_expand_grid

Given one or more R vectors/factors or single-column Spark dataframes,
perform an expand.grid operation on all of them and store the result in
a Spark dataframe

R interface to Apache Spark, a fast and general
engine for big data processing, see <https://spark.apache.org/>. This
package supports connecting to local and remote Apache Spark clusters,
provides a 'dplyr' compatible back-end, and provides an interface to
Spark's built-in machine learning algorithms.

Edgar Ruiz

sparklyr

R Interface to Apache Spark

Javier Luraschi

Kevin Kuo

Kevin Ushey

JJ Allaire

Samuel Macedo

Hossein Falaki

Lu Wang

Andy Zhang

Yitao Li

Jozef Hajnala

Maciej Szymkiewicz

Wil Davis

 RStudio

 The Apache Software Foundation

sdf_expand_grid function

<dl><dt>sc</dt>
<dd>The associated Spark connection.</dd>
<dt>...</dt>
<dd>Each input variable can be either a R vector/factor or a Spark
dataframe. Unnamed inputs will assume the default names of 'Var1', 'Var2',
etc in the result, similar to what `expand.grid` does for unnamed inputs.</dd>
<dt>broadcast_vars</dt>
<dd>Indicates which input(s) should be broadcasted to all
nodes of the Spark cluster during the join process (default: none).</dd>
<dt>memory</dt>
<dd>Boolean; whether the resulting Spark dataframe should be
cached into memory (default: TRUE)</dd>
<dt>repartition</dt>
<dd>Number of partitions the resulting Spark dataframe should
have</dd>
<dt>partition_by</dt>
<dd>Vector of column names used for partitioning the
resulting Spark dataframe, only supported for Spark 2.0+</dd></dl>

Arguments

Create a Spark dataframe containing all combinations of inputs — sdf_expand_grid

<dl>

<dt>sc</dt>
<dd>The associated Spark connection.</dd>


<dt>...</dt>
<dd>Each input variable can be either a R vector/factor or a Spark
dataframe. Unnamed inputs will assume the default names of 'Var1', 'Var2',
etc in the result, similar to what `expand.grid` does for unnamed inputs.</dd>


<dt>broadcast_vars</dt>
<dd>Indicates which input(s) should be broadcasted to all
nodes of the Spark cluster during the join process (default: none).</dd>


<dt>memory</dt>
<dd>Boolean; whether the resulting Spark dataframe should be
cached into memory (default: TRUE)</dd>


<dt>repartition</dt>
<dd>Number of partitions the resulting Spark dataframe should
have</dd>


<dt>partition_by</dt>
<dd>Vector of column names used for partitioning the
resulting Spark dataframe, only supported for Spark 2.0+</dd>

</dl>

Create a Spark dataframe containing all combinations of inputs

sdf_expand_grid: Create a Spark dataframe containing all combinations of inputs

Description

Usage

Arguments

Examples