sdf_project

Project features onto principal components

R interface to Apache Spark, a fast and general
engine for big data processing, see <https://spark.apache.org/>. This
package supports connecting to local and remote Apache Spark clusters,
provides a 'dplyr' compatible back-end, and provides an interface to
Spark's built-in machine learning algorithms.

Edgar Ruiz

sparklyr

R Interface to Apache Spark

Javier Luraschi

Kevin Kuo

Kevin Ushey

JJ Allaire

Samuel Macedo

Hossein Falaki

Lu Wang

Andy Zhang

Yitao Li

Jozef Hajnala

Maciej Szymkiewicz

Wil Davis

 RStudio

 The Apache Software Foundation

sdf_project function

<dl><dt>object</dt>
<dd>A Spark PCA model object</dd>
<dt>newdata</dt>
<dd>An object coercible to a Spark DataFrame</dd>
<dt>features</dt>
<dd>A vector of names of columns to be projected</dd>
<dt>feature_prefix</dt>
<dd>The prefix used in naming the output features</dd>
<dt>...</dt>
<dd>Optional arguments; currently unused.</dd></dl>

Arguments

The family of functions prefixed with <code>sdf_</code> generally access the Scala
Spark DataFrame API directly, as opposed to the <code>dplyr</code> interface which
uses Spark SQL. These functions will 'force' any pending SQL in a
<code>dplyr</code> pipeline, such that the resulting <code>tbl_spark</code> object
returned will no longer have the attached 'lazy' SQL operations. Note that
the underlying Spark DataFrame does execute its operations lazily, so
that even though the pending set of operations (currently) are not exposed at
the R level, these operations will only be executed when you explicitly
<code>collect()</code> the table.

Transforming Spark DataFrames

Project features onto principal components — sdf_project

<dl>

<dt>object</dt>
<dd>A Spark PCA model object</dd>


<dt>newdata</dt>
<dd>An object coercible to a Spark DataFrame</dd>


<dt>features</dt>
<dd>A vector of names of columns to be projected</dd>


<dt>feature_prefix</dt>
<dd>The prefix used in naming the output features</dd>


<dt>...</dt>
<dd>Optional arguments; currently unused.</dd>

</dl>

The family of functions prefixed with <code>sdf_</code> generally access the Scala
Spark DataFrame API directly, as opposed to the <code>dplyr</code> interface which
uses Spark SQL. These functions will 'force' any pending SQL in a
<code>dplyr</code> pipeline, such that the resulting <code>tbl_spark</code> object
returned will no longer have the attached 'lazy' SQL operations. Note that
the underlying Spark DataFrame does execute its operations lazily, so
that even though the pending set of operations (currently) are not exposed at
the R level, these operations will only be executed when you explicitly
<code>collect()</code> the table.

sdf_project: Project features onto principal components

Description

Usage

Arguments

Transforming Spark DataFrames