Learn R Programming

h2o (version 3.8.3.3)

h2o.kmeans: KMeans Model in H2O

Description

Performs k-means clustering on an H2O dataset.

Usage

h2o.kmeans(training_frame, x, k, model_id, ignore_const_cols = TRUE, max_iterations = 1000, standardize = TRUE, init = c("Furthest", "Random", "PlusPlus"), seed, nfolds = 0, fold_column = NULL, fold_assignment = c("AUTO", "Random", "Modulo", "Stratified"), keep_cross_validation_predictions = FALSE, keep_cross_validation_fold_assignment = FALSE, max_runtime_secs = 0)

Arguments

training_frame
An H2OFrame object containing the variables in the model.
x
(Optional) A vector containing the data columns on which k-means operates.
k
The number of clusters. Must be between 1 and 1e7 inclusive. k may be omitted if the user specifies the initial centers in the init parameter. If k is not omitted, in this case, then it should be equal to the number of user-specified centers.
model_id
(Optional) The unique id assigned to the resulting model. If none is given, an id will automatically be generated.
ignore_const_cols
A logical value indicating whether or not to ignore all the constant columns in the training frame.
max_iterations
The maximum number of iterations allowed. Must be between 0
standardize
Logical, indicates whether the data should be standardized before running k-means.
init
A character string that selects the initial set of k cluster centers. Possible values are "Random": for random initialization, "PlusPlus": for k-means plus initialization, or "Furthest": for initialization at the furthest point from each successive center. Additionally, the user may specify a the initial centers as a matrix, data.frame, H2OFrame, or list of vectors. For matrices, data.frames, and Frames, each row of the respective structure is an initial center. For lists of vectors, each vector is an initial center.
seed
(Optional) Random seed used to initialize the cluster centroids.
nfolds
(Optional) Number of folds for cross-validation.
fold_column
(Optional) Column with cross-validation fold index assignment per observation
fold_assignment
Cross-validation fold assignment scheme, if fold_column is not specified, must be "AUTO", "Random", "Modulo", or "Stratified". The Stratified option will stratify the folds based on the response variable, for classification problems.
keep_cross_validation_predictions
Whether to keep the predictions of the cross-validation models
keep_cross_validation_fold_assignment
Whether to keep the cross-validation fold assignment.
max_runtime_secs
Maximum allowed runtime in seconds for model training. Use 0 to disable.

Value

Returns an object of class H2OClusteringModel.

See Also

h2o.cluster_sizes, h2o.totss, h2o.num_iterations, h2o.betweenss, h2o.tot_withinss, h2o.withinss, h2o.centersSTD, h2o.centers

Examples

Run this code

library(h2o)
h2o.init()
prosPath <- system.file("extdata", "prostate.csv", package="h2o")
prostate.hex <- h2o.uploadFile(path = prosPath)
h2o.kmeans(training_frame = prostate.hex, k = 10, x = c("AGE", "RACE", "VOL", "GLEASON"))

Run the code above in your browser using DataLab