Learn R Programming

h2o (version 3.8.1.3)

h2o.kmeans: KMeans Model in H2O

Description

Performs k-means clustering on an H2O dataset.

Usage

h2o.kmeans(training_frame, x, k, model_id, ignore_const_cols = TRUE,
  max_iterations = 1000, standardize = TRUE, init = c("Furthest",
  "Random", "PlusPlus"), seed, nfolds = 0, fold_column = NULL,
  fold_assignment = c("AUTO", "Random", "Modulo"),
  keep_cross_validation_predictions = FALSE, max_runtime_secs = 0)

Arguments

training_frame
An H2OFrame object containing the variables in the model.
x
(Optional) A vector containing the data columns on which k-means operates.
k
The number of clusters. Must be between 1 and 1e7 inclusive. k may be omitted if the user specifies the initial centers in the init parameter. If k is not omitted, in this case, then it should be equal to the number of user-specified centers.
model_id
(Optional) The unique id assigned to the resulting model. If none is given, an id will automatically be generated.
ignore_const_cols
A logical value indicating whether or not to ignore all the constant columns in the training frame.
max_iterations
The maximum number of iterations allowed. Must be between 0
standardize
Logical, indicates whether the data should be standardized before running k-means.
init
A character string that selects the initial set of k cluster centers. Possible values are "Random": for random initialization, "PlusPlus": for k-means plus initialization, or "Furthest": for initialization at the furthest point from each successive center
seed
(Optional) Random seed used to initialize the cluster centroids.
nfolds
(Optional) Number of folds for cross-validation. If nfolds >= 2, then validation must remain empty.
fold_column
(Optional) Column with cross-validation fold index assignment per observation
fold_assignment
Cross-validation fold assignment scheme, if fold_column is not specified Must be "AUTO", "Random" or "Modulo"
keep_cross_validation_predictions
Whether to keep the predictions of the cross-validation models
max_runtime_secs
Maximum allowed runtime in seconds for model training. Use 0 to disable.

Value

  • Returns an object of class H2OClusteringModel.

See Also

h2o.cluster_sizes, h2o.totss, h2o.num_iterations, h2o.betweenss, h2o.tot_withinss, h2o.withinss, h2o.centersSTD, h2o.centers

Examples

Run this code
library(h2o)
h2o.init()
prosPath <- system.file("extdata", "prostate.csv", package="h2o")
prostate.hex <- h2o.uploadFile(path = prosPath)
h2o.kmeans(training_frame = prostate.hex, k = 10, x = c("AGE", "RACE", "VOL", "GLEASON"))

Run the code above in your browser using DataLab