Implements the transformations which are defined by SQL statement. Currently we only support SQL syntax like 'SELECT ... FROM __THIS__ ...' where '__THIS__' represents the underlying table of the input dataset. The select clause specifies the fields, constants, and expressions to display in the output, it can be any select clause that Spark SQL supports. Users can also use Spark SQL built-in function and UDFs to operate on these selected columns.
ft_sql_transformer(
x,
statement = NULL,
uid = random_string("sql_transformer_"),
...
)ft_dplyr_transformer(x, tbl, uid = random_string("dplyr_transformer_"), ...)
The object returned depends on the class of x
.
spark_connection
: When x
is a spark_connection
, the function returns a ml_transformer
,
a ml_estimator
, or one of their subclasses. The object contains a pointer to
a Spark Transformer
or Estimator
object and can be used to compose
Pipeline
objects.
ml_pipeline
: When x
is a ml_pipeline
, the function returns a ml_pipeline
with
the transformer or estimator appended to the pipeline.
tbl_spark
: When x
is a tbl_spark
, a transformer is constructed then
immediately applied to the input tbl_spark
, returning a tbl_spark
A spark_connection
, ml_pipeline
, or a tbl_spark
.
A SQL statement.
A character string used to uniquely identify the feature transformer.
Optional arguments; currently unused.
A tbl_spark
generated using dplyr
transformations.
ft_dplyr_transformer()
is mostly a wrapper around ft_sql_transformer()
that
takes a tbl_spark
instead of a SQL statement. Internally, the ft_dplyr_transformer()
extracts the dplyr
transformations used to generate tbl
as a SQL statement or a
sampling operation. Note that only single-table dplyr
verbs are supported and that the
sdf_
family of functions are not.
See https://spark.apache.org/docs/latest/ml-features.html for more information on the set of transformations available for DataFrame columns in Spark.
Other feature transformers:
ft_binarizer()
,
ft_bucketizer()
,
ft_chisq_selector()
,
ft_count_vectorizer()
,
ft_dct()
,
ft_elementwise_product()
,
ft_feature_hasher()
,
ft_hashing_tf()
,
ft_idf()
,
ft_imputer()
,
ft_index_to_string()
,
ft_interaction()
,
ft_lsh
,
ft_max_abs_scaler()
,
ft_min_max_scaler()
,
ft_ngram()
,
ft_normalizer()
,
ft_one_hot_encoder_estimator()
,
ft_one_hot_encoder()
,
ft_pca()
,
ft_polynomial_expansion()
,
ft_quantile_discretizer()
,
ft_r_formula()
,
ft_regex_tokenizer()
,
ft_robust_scaler()
,
ft_standard_scaler()
,
ft_stop_words_remover()
,
ft_string_indexer()
,
ft_tokenizer()
,
ft_vector_assembler()
,
ft_vector_indexer()
,
ft_vector_slicer()
,
ft_word2vec()