Perform k-means clustering on a Spark DataFrame.
ml_kmeans(x, centers, iter.max = 100, features = dplyr::tbl_vars(x),
compute.cost = TRUE, tolerance = 1e-04, ml.options = ml_options(), ...)
An object coercable to a Spark DataFrame (typically, a
tbl_spark
).
The number of cluster centers to compute.
The maximum number of iterations to use.
The name of features (terms) to use for the model fit.
Whether to compute cost for k-means
model using Spark's computeCost.
Param for the convergence tolerance for iterative algorithms.
Optional arguments, used to affect the model generated. See
ml_options
for more details.
Optional arguments; currently unused.
ml_model object of class kmeans
with overloaded print
, fitted
and predict
functions.
Bahmani et al., Scalable K-Means++, VLDB 2012
For information on how Spark k-means clustering is implemented, please see http://spark.apache.org/docs/latest/mllib-clustering.html#k-means.
Other Spark ML routines: ml_als_factorization
,
ml_decision_tree
,
ml_generalized_linear_regression
,
ml_gradient_boosted_trees
,
ml_lda
, ml_linear_regression
,
ml_logistic_regression
,
ml_multilayer_perceptron
,
ml_naive_bayes
,
ml_one_vs_rest
, ml_pca
,
ml_random_forest
,
ml_survival_regression