spark_pipeline_stage

Helper function to create pipeline stage objects with common parameter setters.

internal

R interface to Apache Spark, a fast and general
engine for big data processing, see <https://spark.apache.org/>. This
package supports connecting to local and remote Apache Spark clusters,
provides a 'dplyr' compatible back-end, and provides an interface to
Spark's built-in machine learning algorithms.

Edgar Ruiz

sparklyr

R Interface to Apache Spark

Javier Luraschi

Kevin Kuo

Kevin Ushey

JJ Allaire

Samuel Macedo

Hossein Falaki

Lu Wang

Andy Zhang

Yitao Li

Jozef Hajnala

Maciej Szymkiewicz

Wil Davis

 RStudio

 The Apache Software Foundation

spark_pipeline_stage function

<dl><dt>sc</dt>
<dd>A `spark_connection` object.</dd>
<dt>class</dt>
<dd>Class name for the pipeline stage.</dd>
<dt>uid</dt>
<dd>A character string used to uniquely identify the ML estimator.</dd>
<dt>features_col</dt>
<dd>Features column name, as a length-one character vector. The column should be single vector column of numeric values. Usually this column is output by <code>ft_r_formula</code>.</dd>
<dt>label_col</dt>
<dd>Label column name. The column should be a numeric column. Usually this column is output by <code>ft_r_formula</code>.</dd>
<dt>prediction_col</dt>
<dd>Prediction column name.</dd>
<dt>probability_col</dt>
<dd>Column name for predicted class conditional probabilities.</dd>
<dt>raw_prediction_col</dt>
<dd>Raw prediction (a.k.a. confidence) column name.</dd>
<dt>k</dt>
<dd>The number of clusters to create</dd>
<dt>max_iter</dt>
<dd>The maximum number of iterations to use.</dd>
<dt>seed</dt>
<dd>A random seed. Set this value if you need your results to be
reproducible across repeated calls.</dd>
<dt>input_col</dt>
<dd>The name of the input column.</dd>
<dt>input_cols</dt>
<dd>Names of output columns.</dd>
<dt>output_col</dt>
<dd>The name of the output column.</dd></dl>

Arguments

Create a Pipeline Stage Object — spark_pipeline_stage

<dl>

<dt>sc</dt>
<dd>A `spark_connection` object.</dd>


<dt>class</dt>
<dd>Class name for the pipeline stage.</dd>


<dt>uid</dt>
<dd>A character string used to uniquely identify the ML estimator.</dd>


<dt>features_col</dt>
<dd>Features column name, as a length-one character vector. The column should be single vector column of numeric values. Usually this column is output by <code>ft_r_formula</code>.</dd>


<dt>label_col</dt>
<dd>Label column name. The column should be a numeric column. Usually this column is output by <code>ft_r_formula</code>.</dd>


<dt>prediction_col</dt>
<dd>Prediction column name.</dd>


<dt>probability_col</dt>
<dd>Column name for predicted class conditional probabilities.</dd>


<dt>raw_prediction_col</dt>
<dd>Raw prediction (a.k.a. confidence) column name.</dd>


<dt>k</dt>
<dd>The number of clusters to create</dd>


<dt>max_iter</dt>
<dd>The maximum number of iterations to use.</dd>


<dt>seed</dt>
<dd>A random seed. Set this value if you need your results to be
reproducible across repeated calls.</dd>


<dt>input_col</dt>
<dd>The name of the input column.</dd>


<dt>input_cols</dt>
<dd>Names of output columns.</dd>


<dt>output_col</dt>
<dd>The name of the output column.</dd>

</dl>

Create a Pipeline Stage Object

spark_pipeline_stage: Create a Pipeline Stage Object

Description

Usage

Arguments