- x
A spark_connection
, ml_pipeline
, or a tbl_spark
.
- formula
Used when x
is a tbl_spark
. R formula as a character string or a formula. This is used to transform the input dataframe before fitting, see ft_r_formula for details.
- k
The number of clusters to create
- max_iter
The maximum number of iterations to use.
- tol
Param for the convergence tolerance for iterative algorithms.
- seed
A random seed. Set this value if you need your results to be
reproducible across repeated calls.
- features_col
Features column name, as a length-one character vector. The column should be single vector column of numeric values. Usually this column is output by ft_r_formula
.
- prediction_col
Prediction column name.
- probability_col
Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities.
- uid
A character string used to uniquely identify the ML estimator.
- ...
Optional arguments, see Details.
#' @return The object returned depends on the class of x
. If it is a
spark_connection
, the function returns a ml_estimator
object. If
it is a ml_pipeline
, it will return a pipeline with the predictor
appended to it. If a tbl_spark
, it will return a tbl_spark
with
the predictions added to it.