Implements the transformations which are defined by SQL statement. Currently we only support SQL syntax like 'SELECT ... FROM __THIS__ ...' where '__THIS__' represents the underlying table of the input dataset. The select clause specifies the fields, constants, and expressions to display in the output, it can be any select clause that Spark SQL supports. Users can also use Spark SQL built-in function and UDFs to operate on these selected columns.
ft_sql_transformer(
x,
statement = NULL,
uid = random_string("sql_transformer_"),
...
)ft_dplyr_transformer(x, tbl, uid = random_string("dplyr_transformer_"), ...)
The object returned depends on the class of x
. If it is a
spark_connection
, the function returns a ml_estimator
or a
ml_estimator
object. If it is a ml_pipeline
, it will return
a pipeline with the transformer or estimator appended to it. If a
tbl_spark
, it will return a tbl_spark
with the transformation
applied to it.
A spark_connection
, ml_pipeline
, or a tbl_spark
.
A SQL statement.
A character string used to uniquely identify the feature transformer.
Optional arguments; currently unused.
A tbl_spark
generated using dplyr
transformations.
ft_dplyr_transformer()
is mostly a wrapper around ft_sql_transformer()
that
takes a tbl_spark
instead of a SQL statement. Internally, the ft_dplyr_transformer()
extracts the dplyr
transformations used to generate tbl
as a SQL statement or a
sampling operation. Note that only single-table dplyr
verbs are supported and that the
sdf_
family of functions are not.
Other feature transformers:
ft_binarizer()
,
ft_bucketizer()
,
ft_chisq_selector()
,
ft_count_vectorizer()
,
ft_dct()
,
ft_elementwise_product()
,
ft_feature_hasher()
,
ft_hashing_tf()
,
ft_idf()
,
ft_imputer()
,
ft_index_to_string()
,
ft_interaction()
,
ft_lsh
,
ft_max_abs_scaler()
,
ft_min_max_scaler()
,
ft_ngram()
,
ft_normalizer()
,
ft_one_hot_encoder_estimator()
,
ft_one_hot_encoder()
,
ft_pca()
,
ft_polynomial_expansion()
,
ft_quantile_discretizer()
,
ft_r_formula()
,
ft_regex_tokenizer()
,
ft_robust_scaler()
,
ft_standard_scaler()
,
ft_stop_words_remover()
,
ft_string_indexer()
,
ft_tokenizer()
,
ft_vector_assembler()
,
ft_vector_indexer()
,
ft_vector_slicer()
,
ft_word2vec()