h2o.kmeans: KMeans Model in H2O

Description

Performs k-means clustering on an H2O dataset.

Usage

h2o.kmeans(training_frame, x, k, model_id, ignore_const_cols = TRUE,
  max_iterations = 1000, standardize = TRUE, init = c("Furthest",
  "Random", "PlusPlus"), seed, nfolds = 0, fold_column = NULL,
  fold_assignment = c("AUTO", "Random", "Modulo"),
  keep_cross_validation_predictions = FALSE, max_runtime_secs = 0)

Arguments

training_frame

An H2OFrame object containing the variables in the model.

(Optional) A vector containing the data columns on which k-means operates.

The number of clusters. Must be between 1 and 1e7 inclusive. k may be omitted if the user specifies the initial centers in the init parameter. If k is not omitted, in this case, then it should be equal to the number of user-specified centers.

model_id

(Optional) The unique id assigned to the resulting model. If none is given, an id will automatically be generated.

ignore_const_cols

A logical value indicating whether or not to ignore all the constant columns in the training frame.

max_iterations

The maximum number of iterations allowed. Must be between 0

standardize

Logical, indicates whether the data should be standardized before running k-means.

init

A character string that selects the initial set of k cluster centers. Possible values are "Random": for random initialization, "PlusPlus": for k-means plus initialization, or "Furthest": for initialization at the furthest point from each successive center

seed

(Optional) Random seed used to initialize the cluster centroids.

nfolds

(Optional) Number of folds for cross-validation. If nfolds >= 2, then validation must remain empty.

fold_column

(Optional) Column with cross-validation fold index assignment per observation

fold_assignment

Cross-validation fold assignment scheme, if fold_column is not specified Must be "AUTO", "Random" or "Modulo"

keep_cross_validation_predictions

Whether to keep the predictions of the cross-validation models

max_runtime_secs

Maximum allowed runtime in seconds for model training. Use 0 to disable.

Value

Returns an object of class H2OClusteringModel.

Examples

Run this code

library(h2o)
h2o.init()
prosPath <- system.file("extdata", "prostate.csv", package="h2o")
prostate.hex <- h2o.uploadFile(path = prosPath)
h2o.kmeans(training_frame = prostate.hex, k = 10, x = c("AGE", "RACE", "VOL", "GLEASON"))

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples