Usage
h2o.kmeans(training_frame, x, k, model_id, ignore_const_cols = TRUE,
max_iterations = 1000, standardize = TRUE, init = c("Furthest",
"Random", "PlusPlus"), seed, nfolds = 0, fold_column = NULL,
fold_assignment = c("AUTO", "Random", "Modulo"),
keep_cross_validation_predictions = FALSE, max_runtime_secs = 0)
Arguments
training_frame
An H2OFrame object containing the
variables in the model.
x
(Optional) A vector containing the data columns on
which k-means operates.
k
The number of clusters. Must be between 1 and
1e7 inclusive. k may be omitted if the user specifies the
initial centers in the init parameter. If k is not omitted,
in this case, then it should be equal to the number of
user-specified centers.
model_id
(Optional) The unique id assigned to the resulting model. If
none is given, an id will automatically be generated.
ignore_const_cols
A logical value indicating whether or not to ignore all the constant columns in the training frame.
max_iterations
The maximum number of iterations allowed. Must be between 0
standardize
Logical, indicates whether the data should be
standardized before running k-means.
init
A character string that selects the initial set of k cluster
centers. Possible values are "Random": for random initialization,
"PlusPlus": for k-means plus initialization, or "Furthest": for
initialization at the furthest point from each successive center
seed
(Optional) Random seed used to initialize the cluster centroids.
nfolds
(Optional) Number of folds for cross-validation. If nfolds >= 2
, then validation
must remain empty.
fold_column
(Optional) Column with cross-validation fold index assignment per observation
fold_assignment
Cross-validation fold assignment scheme, if fold_column is not specified
Must be "AUTO", "Random" or "Modulo"
keep_cross_validation_predictions
Whether to keep the predictions of the cross-validation models
max_runtime_secs
Maximum allowed runtime in seconds for model training. Use 0 to disable.