The function implements the k-means using adaptive distance for a set of histogram-valued data.
WH_adaptive.kmeans(
x,
k,
schema = 1,
init,
rep,
simplify = FALSE,
qua = 10,
standardize = FALSE,
weight.sys = "PROD",
theta = 2,
init.weights = "EQUAL",
verbose = FALSE
)
A MatH object (a matrix of distributionH).
An integer, the number of groups.
a number from 1 to 4 1=A weight for each variable (default) 2=A weight for the average and the dispersion component of each variable 3=Same as 1 but a different set of weights for each cluster 4=Same as 2 but a different set of weights for each cluster
(optional, do not use) initialization for partitioning the data default is 'RPART', other strategies shoul be implemented.
An integer, maximum number of repetitions of the algorithm (default rep
=5).
A logic value (default is FALSE), if TRUE histograms are recomputed in order to speed-up the algorithm.
An integer, if simplify
=TRUE is the number of quantiles used for recodify the histograms.
A logic value (default is FALSE). If TRUE, histogram-valued data are standardized, variable by variable, using the Wassertein based standard deviation. Use if one wants to have variables with std equal to one.
a string. Weights may add to one ('SUM') or their product is equal to 1 ('PROD', default).
a number. A parameter if weight.sys='SUM'
, default is 2.
a string how to initialize weights: 'EQUAL' (default), all weights are the same, 'RANDOM', weights are initalised at random.
A logic value (default is FALSE). If TRUE, details on computations are shown.
a list with the results of the k-means of the set of Histogram-valued data x
into k
cluster.
solution
A list.Returns the best solution among the rep
etitions, i.e.
the one having the minimum sum of squares criterion.
solution$IDX
A vector. The clusters at which the objects are assigned.
solution$cardinality
A vector. The cardinality of each final cluster.
solution$centers
A MatH
object with the description of centers.
solution$Crit
A number. The criterion (Sum od square deviation from the centers) value at the end of the run.
quality
A number. The percentage of Sum of square deviation explained by the model. (The higher the better)
Irpino A., Rosanna V., De Carvalho F.A.T. (2014). Dynamic clustering of histogram data based on adaptive squared Wasserstein distances. EXPERT SYSTEMS WITH APPLICATIONS, vol. 41, p. 3351-3366, ISSN: 0957-4174, doi: http://dx.doi.org/10.1016/j.eswa.2013.12.001
# NOT RUN {
results=WH_adaptive.kmeans(x = BLOOD,k = 2, rep = 10,simplify = TRUE,qua = 10,standardize = TRUE)
# }
Run the code above in your browser using DataLab