Usage
h2o.kmeans(training_frame, x, k, model_id, ignore_const_cols = TRUE, max_iterations = 1000, standardize = TRUE, init = c("Furthest", "Random", "PlusPlus"), seed, nfolds = 0, fold_column = NULL, fold_assignment = c("AUTO", "Random", "Modulo", "Stratified"), keep_cross_validation_predictions = FALSE, keep_cross_validation_fold_assignment = FALSE, max_runtime_secs = 0)
Arguments
training_frame
An H2OFrame object containing the
variables in the model.
x
(Optional) A vector containing the data columns on
which k-means operates.
k
The number of clusters. Must be between 1 and
1e7 inclusive. k may be omitted if the user specifies the
initial centers in the init parameter. If k is not omitted,
in this case, then it should be equal to the number of
user-specified centers.
model_id
(Optional) The unique id assigned to the resulting model. If
none is given, an id will automatically be generated.
ignore_const_cols
A logical value indicating whether or not to ignore all the constant columns in the training frame.
max_iterations
The maximum number of iterations allowed. Must be between 0
standardize
Logical, indicates whether the data should be
standardized before running k-means.
init
A character string that selects the initial set of k cluster
centers. Possible values are "Random": for random initialization,
"PlusPlus": for k-means plus initialization, or "Furthest": for
initialization at the furthest point from each successive center.
Additionally, the user may specify a the initial centers as a matrix,
data.frame, H2OFrame, or list of vectors. For matrices,
data.frames, and Frames, each row of the respective structure
is an initial center. For lists of vectors, each vector is an
initial center.
seed
(Optional) Random seed used to initialize the cluster centroids.
nfolds
(Optional) Number of folds for cross-validation.
fold_column
(Optional) Column with cross-validation fold index assignment per observation
fold_assignment
Cross-validation fold assignment scheme, if fold_column is not
specified, must be "AUTO", "Random", "Modulo", or "Stratified". The Stratified option will
stratify the folds based on the response variable, for classification problems.
keep_cross_validation_predictions
Whether to keep the predictions of the cross-validation models
keep_cross_validation_fold_assignment
Whether to keep the cross-validation fold assignment.
max_runtime_secs
Maximum allowed runtime in seconds for model training. Use 0 to disable.