K-means variant that uses a class-wise Mahalanobis metric. The implementation follows somewhat Lloyd's, with class-wise covariance computation step following that of centres.
Matrix with n rows and d columns of n d-dimensional data elements to cluster.
k
Number of clusters in the output.
maxiter
Maximum number of iterations.
seeds
Optional indexes of initial centres taken in the input data. If NULL, uniform sampling is used.
prior
Prior population size used for regularizing components.
Value
labels
Cluster labels taking values in 1..k
w
Numeric vector of cluster weights
mean
List of mean vectors
cov
List of covariance matrices
Details
K-means is characterized by the use of identity as the metric. To remain close to this in spirit, each class-wise covariance matrix is normalized after computation so that is trace equals d. This avoids excessively unbalanced classes, while facilitating the case where the support of a given cluster is less than 2 - covariance cannot be computed in this case. Covariance then defaults to identity. Also to prevent degeneracies when 2 < cluster size < d, a regularization term proportional to sample data features is added to the covariance diagonal. The returned value follows the GMM data structure (i.e., as returned by e.g. varbayes() and newGmm())