Initializes the cluster prototypes matrix using a modified version of the Simple Cluster Seeking (SCS) algorithm proposed by Tou & Gonzales(1974). While SCS uses a fixed threshold distance value T for selecting all candidates of clusters, the modified SCS recomputes T with the average Euclidean distances between the previously determined prototypes. This adjustment makes possible to select more cluster prototypes when compared to SCS.
mscseek(x, k, tv)
a numeric vector, data frame or matrix.
an integer for the number of clusters.
a number to be used as the threshold distance which is directly input by the user. Also it is possible to compute T, a threshold distance value with the following options of tv
argument:
T is the mean of differences between the consecutive pairs of objects with the option cd1.
T is the minimum of differences between the consecutive pairs of objects with the option cd2.
T is the mean of Euclidean distances between the consecutive pairs of objects divided into k with the option md. This is the default if tv
is not supplied by the user.
T is the range of maximum and minimum of Euclidean distances between the consecutive pairs of objects divided into k with the option mm.
an object of class ‘inaparc’, which is a list consists of the following items:
a numeric matrix of the initial cluster prototypes.
a string representing the type of centroid, which used to build prototype matrix. Its value is ‘obj’ with this function because the cluster prototype matrix contains the objects.
a string containing the matched function call that generates the object ‘inaparc’.
This is a modification of the Simple Cluster Seeking (SCS) algorithm (Tou & Gonzalez, 1974). The algorithm selects the first object in the data set as the prototype of the first cluster. Then, next object whose distance to the first prototype is greater than a threshold distance value is searched and assigned as the second cluster prototype. Instead of using a fixed the T, threshold distance value as SCS does, the modified SCS recomputes the T by the average Euclidean distances between the previously determined prototypes of clusters. The next object whose distance to the previously selected object is greater than the adjusted T is searched and assigned as the third cluster prototype. The selection process is repeated for the remaining clusters in similar way. The method is sensitive to the order of the data, it may not yield good initializations with the ordered data.
Tou, J.T. & Gonzalez, R.C. (1974). Pattern Recognition Principles. Addison-Wesley, Reading, MA. <ISBN:9780201075861>
aldaoud
,
ballhall
,
crsamp
,
firstk
,
forgy
,
hartiganwong
,
inofrep
,
inscsf
,
insdev
,
kkz
,
kmpp
,
ksegments
,
ksteps
,
lastk
,
lhsmaximin
,
lhsrandom
,
maximin
,
rsamp
,
rsegment
,
scseek
,
scseek2
,
spaeth
,
ssamp
,
topbottom
,
uniquek
,
ursamp
# NOT RUN {
data(iris)
# Run with the threshold value of 0.1
res <- mscseek(x=iris[,1:4], k=5, tv=0.1)
v1 <- res$v
print(v1)
# Run with the internally computed default threshold value
res <- mscseek(x=iris[,1:4], k=5)
v2 <- res$v
print(v2)
# }
Run the code above in your browser using DataLab