WH_adaptive.kmeans: K-means of a dataset of histogram-valued data using adaptive Wasserstein distances

Description

The function implements the k-means using adaptive distance for a set of histogram-valued data.

Usage

WH_adaptive.kmeans(
  x,
  k,
  schema = 1,
  init,
  rep,
  simplify = FALSE,
  qua = 10,
  standardize = FALSE,
  weight.sys = "PROD",
  theta = 2,
  init.weights = "EQUAL",
  verbose = FALSE
)

Arguments

A MatH object (a matrix of distributionH).

An integer, the number of groups.

schema

a number from 1 to 4 1=A weight for each variable (default) 2=A weight for the average and the dispersion component of each variable 3=Same as 1 but a different set of weights for each cluster 4=Same as 2 but a different set of weights for each cluster

init

(optional, do not use) initialization for partitioning the data default is 'RPART', other strategies shoul be implemented.

rep

An integer, maximum number of repetitions of the algorithm (default rep=5).

simplify

A logic value (default is FALSE), if TRUE histograms are recomputed in order to speed-up the algorithm.

qua

An integer, if simplify=TRUE is the number of quantiles used for recodify the histograms.

standardize

A logic value (default is FALSE). If TRUE, histogram-valued data are standardized, variable by variable, using the Wassertein based standard deviation. Use if one wants to have variables with std equal to one.

weight.sys

a string. Weights may add to one ('SUM') or their product is equal to 1 ('PROD', default).

theta

a number. A parameter if weight.sys='SUM', default is 2.

init.weights

a string how to initialize weights: 'EQUAL' (default), all weights are the same, 'RANDOM', weights are initalised at random.

verbose

A logic value (default is FALSE). If TRUE, details on computations are shown.

Value

a list with the results of the k-means of the set of Histogram-valued data x into k cluster.

Slots

solution: A list.Returns the best solution among the repetitions, i.e. the one having the minimum sum of squares criterion.

solution$IDX

A vector. The clusters at which the objects are assigned.

solution$cardinality

A vector. The cardinality of each final cluster.

solution$centers

A MatH object with the description of centers.

solution$Crit

A number. The criterion (Sum od square deviation from the centers) value at the end of the run.

quality

A number. The percentage of Sum of square deviation explained by the model. (The higher the better)

References

Irpino A., Rosanna V., De Carvalho F.A.T. (2014). Dynamic clustering of histogram data based on adaptive squared Wasserstein distances. EXPERT SYSTEMS WITH APPLICATIONS, vol. 41, p. 3351-3366, ISSN: 0957-4174, doi: http://dx.doi.org/10.1016/j.eswa.2013.12.001

Examples

Run this code

# NOT RUN {
results=WH_adaptive.kmeans(x = BLOOD,k = 2, rep = 10,simplify = TRUE,qua = 10,standardize = TRUE)
# }

Run the code above in your browser using DataLab