Implements the transformations which are defined by SQL statement. Currently we only support SQL syntax like 'SELECT ... FROM __THIS__ ...' where '__THIS__' represents the underlying table of the input dataset. The select clause specifies the fields, constants, and expressions to display in the output, it can be any select clause that Spark SQL supports. Users can also use Spark SQL built-in function and UDFs to operate on these selected columns.
ft_sql_transformer(
x,
statement = NULL,
uid = random_string("sql_transformer_"),
...
)ft_dplyr_transformer(x, tbl, uid = random_string("dplyr_transformer_"), ...)
The object returned depends on the class of x.
spark_connection: When x is a spark_connection, the function returns a ml_transformer,
a ml_estimator, or one of their subclasses. The object contains a pointer to
a Spark Transformer or Estimator object and can be used to compose
Pipeline objects.
ml_pipeline: When x is a ml_pipeline, the function returns a ml_pipeline with
the transformer or estimator appended to the pipeline.
tbl_spark: When x is a tbl_spark, a transformer is constructed then
immediately applied to the input tbl_spark, returning a tbl_spark
A spark_connection, ml_pipeline, or a tbl_spark.
A SQL statement.
A character string used to uniquely identify the feature transformer.
Optional arguments; currently unused.
A tbl_spark generated using dplyr transformations.
ft_dplyr_transformer() is mostly a wrapper around ft_sql_transformer() that
takes a tbl_spark instead of a SQL statement. Internally, the ft_dplyr_transformer()
extracts the dplyr transformations used to generate tbl as a SQL statement or a
sampling operation. Note that only single-table dplyr verbs are supported and that the
sdf_ family of functions are not.
See https://spark.apache.org/docs/latest/ml-features.html for more information on the set of transformations available for DataFrame columns in Spark.
Other feature transformers:
ft_binarizer(),
ft_bucketizer(),
ft_chisq_selector(),
ft_count_vectorizer(),
ft_dct(),
ft_elementwise_product(),
ft_feature_hasher(),
ft_hashing_tf(),
ft_idf(),
ft_imputer(),
ft_index_to_string(),
ft_interaction(),
ft_lsh,
ft_max_abs_scaler(),
ft_min_max_scaler(),
ft_ngram(),
ft_normalizer(),
ft_one_hot_encoder_estimator(),
ft_one_hot_encoder(),
ft_pca(),
ft_polynomial_expansion(),
ft_quantile_discretizer(),
ft_r_formula(),
ft_regex_tokenizer(),
ft_robust_scaler(),
ft_standard_scaler(),
ft_stop_words_remover(),
ft_string_indexer(),
ft_tokenizer(),
ft_vector_assembler(),
ft_vector_indexer(),
ft_vector_slicer(),
ft_word2vec()