WH_kmeans: K-means of a dataset of histogram-valued data

Description

The function implements the k-means for a set of histogram-valued data.

Usage

WH_kmeans(
  x,
  k,
  rep = 5,
  simplify = FALSE,
  qua = 10,
  standardize = FALSE,
  verbose = FALSE
)

Arguments

A MatH object (a matrix of distributionH).

An integer, the number of groups.

rep

An integer, maximum number of repetitions of the algorithm (default rep=5).

simplify

A logic value (default is FALSE), if TRUE histograms are recomputed in order to speed-up the algorithm.

qua

An integer, if simplify=TRUE is the number of quantiles used for recodify the histograms.

standardize

A logic value (default is FALSE). If TRUE, histogram-valued data are standardized, variable by variable, using the Wassertein based standard deviation. Use if one wants to have variables with std equal to one.

verbose

A logic value (default is FALSE). If TRUE, details on computations are shown.

Value

a list with the results of the k-means of the set of Histogram-valued data x into k cluster.

Slots

solution: A list.Returns the best solution among the repetitions, i.e. the one having the minimum sum of squares criterion.

solution$IDX

A vector. The clusters at which the objects are assigned.

solution$cardinality

A vector. The cardinality of each final cluster.

solution$centers

A MatH object with the description of centers.

solution$Crit

A number. The criterion (Sum od square deviation from the centers) value at the end of the run.

quality

A number. The percentage of Sum of square deviation explained by the model. (The higher the better)

References

Irpino A., Verde R., Lechevallier Y. (2006). Dynamic clustering of histograms using Wasserstein metric. In: Rizzi A., Vichi M.. COMPSTAT 2006 - Advances in computational statistics. p. 869-876, Heidelberg:Physica-Verlag

Examples

Run this code

# NOT RUN {
results=WH_kmeans(x = BLOOD,k = 2, rep = 10,simplify = TRUE,
qua = 10,standardize = TRUE,verbose=TRUE)
# }

Run the code above in your browser using DataLab