Learn R Programming

helda (version 1.1.5)

kmeans_procedure: K-means procedure

Description

This function allows to perform k-means clustering with constrained on the size of clusters

Usage

kmeans_procedure(
  data,
  columns,
  threshold_min,
  threshold_max,
  verbose = FALSE,
  seed = 42
)

Arguments

data

a R data frame.

columns

a vector of columns names of the data frame on which we perform the kmeans algorithm. These features have to be numeric.

threshold_min

an integer. It represents the minimum size for cluster.

threshold_max

an integer. It represents the maximum size fo cluster.

verbose

a boolean. If set to TRUE print the current state of the procedure (by default set to FALSE).

seed

an integer. This represents the seed for the random call (if we want the output to be reproducible).

Value

a R data frame. This contains the id of the original data frame and a column `cluster` representing the cluster to which the observation belongs to.

References

Link to the author's github package repository: https://github.com/Redcart/helda

Examples

Run this code
# NOT RUN {
library(dplyr)
data <- iris %>% select(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)
features <- colnames(data)
result <- kmeans_procedure(data = data, columns = features, threshold_min = 2, threshold = 10,
verbose=FALSE, seed=10)
# }

Run the code above in your browser using DataLab