Kmeans: Perform k-means clustering on a data matrix.
Description
K-means provides k disjoint sets for a dataset using a parallel and fast
NUMA optimized version of Lloyd's algorithm. The details of which are found
in this paper https://arxiv.org/pdf/1606.08905.pdf.
Either (i) The number of centers (i.e., k), or
(ii) an In-memory data matrix, or (iii) A 2-Element list with element 1
being a filename for precomputed centers, and element 2
the number of centroids.
nrow
The number of samples in the dataset
ncol
The number of features in the dataset
iter.max
The maximum number of iteration of k-means to perform
nthread
The number of parallel thread to run
init
The type of initialization to use c("kmeanspp", "random",
"forgy", "none")
tolerance
The convergence tolerance
dist.type
What dissimilarity metric to use
omp
Use (slower) OpenMP threads rather than pthreads
Value
A list containing the attributes of the output of kmeans.
cluster: A vector of integers (from 1:k) indicating the cluster to
which each point is allocated.
centers: A matrix of cluster centres.
size: The number of points in each cluster.
iter: The number of (outer) iterations.