h2o.kmeans: KMeans Model in H2O

Description

Performs k-means clustering on an H2O dataset.

Usage

h2o.kmeans(training_frame, x, k, model_id, ignore_const_cols = TRUE, max_iterations = 1000, standardize = TRUE, init = c("Furthest", "Random", "PlusPlus"), seed, nfolds = 0, fold_column = NULL, fold_assignment = c("AUTO", "Random", "Modulo", "Stratified"), keep_cross_validation_predictions = FALSE, keep_cross_validation_fold_assignment = FALSE, max_runtime_secs = 0)

Arguments

training_frame

An H2OFrame object containing the variables in the model.

(Optional) A vector containing the data columns on which k-means operates.

The number of clusters. Must be between 1 and 1e7 inclusive. k may be omitted if the user specifies the initial centers in the init parameter. If k is not omitted, in this case, then it should be equal to the number of user-specified centers.

model_id

(Optional) The unique id assigned to the resulting model. If none is given, an id will automatically be generated.

ignore_const_cols

A logical value indicating whether or not to ignore all the constant columns in the training frame.

max_iterations

The maximum number of iterations allowed. Must be between 0

standardize

Logical, indicates whether the data should be standardized before running k-means.

init

A character string that selects the initial set of k cluster centers. Possible values are "Random": for random initialization, "PlusPlus": for k-means plus initialization, or "Furthest": for initialization at the furthest point from each successive center. Additionally, the user may specify a the initial centers as a matrix, data.frame, H2OFrame, or list of vectors. For matrices, data.frames, and Frames, each row of the respective structure is an initial center. For lists of vectors, each vector is an initial center.

seed

(Optional) Random seed used to initialize the cluster centroids.

nfolds

(Optional) Number of folds for cross-validation.

fold_column

(Optional) Column with cross-validation fold index assignment per observation

fold_assignment

Cross-validation fold assignment scheme, if fold_column is not specified, must be "AUTO", "Random", "Modulo", or "Stratified". The Stratified option will stratify the folds based on the response variable, for classification problems.

keep_cross_validation_predictions

Whether to keep the predictions of the cross-validation models

keep_cross_validation_fold_assignment

Whether to keep the cross-validation fold assignment.

max_runtime_secs

Maximum allowed runtime in seconds for model training. Use 0 to disable.

Value

Returns an object of class H2OClusteringModel.

Examples

Run this code


library(h2o)
h2o.init()
prosPath <- system.file("extdata", "prostate.csv", package="h2o")
prostate.hex <- h2o.uploadFile(path = prosPath)
h2o.kmeans(training_frame = prostate.hex, k = 10, x = c("AGE", "RACE", "VOL", "GLEASON"))

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples