Kmeans: Perform k-means clustering on a data matrix.

Description

K-means provides k disjoint sets for a dataset using a parallel and fast NUMA optimized version of Lloyd's algorithm. The details of which are found in this paper https://arxiv.org/pdf/1606.08905.pdf.

Usage

Kmeans(
  data,
  centers,
  nrow = -1,
  ncol = -1,
  iter.max = .Machine$integer.max,
  nthread = -1,
  init = c("kmeanspp", "random", "forgy", "none"),
  tolerance = 1e-06,
  dist.type = c("eucl", "cos"),
  omp = FALSE
)

Arguments

data

Data file name on disk or In memory data matrix

centers

Either (i) The number of centers (i.e., k), or (ii) an In-memory data matrix, or (iii) A 2-Element list with element 1 being a filename for precomputed centers, and element 2 the number of centroids.

nrow

The number of samples in the dataset

ncol

The number of features in the dataset

iter.max

The maximum number of iteration of k-means to perform

nthread

The number of parallel thread to run

init

The type of initialization to use c("kmeanspp", "random", "forgy", "none")

tolerance

The convergence tolerance

dist.type

What dissimilarity metric to use

omp

Use (slower) OpenMP threads rather than pthreads

Value

A list containing the attributes of the output of kmeans. cluster: A vector of integers (from 1:k) indicating the cluster to which each point is allocated. centers: A matrix of cluster centres. size: The number of points in each cluster. iter: The number of (outer) iterations.

Examples

Run this code

# NOT RUN {
iris.mat <- as.matrix(iris[,1:4])
k <- length(unique(iris[, dim(iris)[2]])) # Number of unique classes
kms <- Kmeans(iris.mat, k)

# }

Run the code above in your browser using DataLab