Optimal partitioning is an iterative re-allocation algorithm to maximize the ratio of within-cluster similarity/among-cluster similarity for a given number of clusters. Optpart can operate as either a crisp (classical) partitioning, or a fuzzy partitioning algorithm.
optpart(x, dist, maxitr = 100, mininc = 0.001, maxdmu = 1)
an integer, integer vector, factor vector, or objects of class ‘clustering’, ‘partana’, ‘partition’ or ‘stride’
the maximum number of iterations to perform
the minimum increment in the within/among similarity ratio to continue iterating
the ‘maximum delta mu’. If 1, a crisp (non-fuzzy) partition results. If (0,1) a fuzzy partition results.
an object of class partana
, a list with elements:
a matrix of item mean similarity to each cluster
a matrix of mean cluster-to-cluster similarity
a matrix of membership of each item to each cluster. If maxdmu
is 1, this will be a single 1 in the appropriate cluster and 0 in all others. If
maxdmu
is (0,1) then the musubx represent fuzzy memberships in each cluster.
a vector giving the cluster each item is assigned to. If optpart is run as a fuzzy partitioning, this is determined by the maximum membership observed.
the vector of within/among similarities achieved at each iteration. The final non-zero value is the final ratio achieved.
the number of iterations performed
the names of the items clustered
optpart produces a partition, or clustering, of items into clusters by iterative reallocation of items to clusters so as to maximize the within cluster/ among cluster similarity ratio. At each iteration optpart ranks all possible re-allocations of a sample from one cluster to another. The re-allocation that maximizes the change in the within-cluster/among-cluster ratio is performed. The next best reallocation is considered, and if it does not include any clusters already modified, it is also performed, as re-allocations of independent clusters are independent and additive in effect. When no further re-allocations can be performed in that iteration, the algorithm recalculates all possible re-allocations and iterates again. When no re-allocations exist that improve the within/among ratio greater than ‘mininc’, or the maximum number of iterations is reached, the algorithm stops.
optpart is designed to run from a random start or the levels of a factor, or
preferably from existing initial partitions. Specifying a single integer gives the number of clusters
desired using a random start. Specifying an integer vector gives the initial assignments of
items to clusters. Initial assignments can also be extracted from a number of objects.
Specific
methods exist for objects of class ‘clustering’ from functions
slice
or archi
, class ‘partana’ from function
partana
, class ‘stride’ from stride
, or
class ‘partition’ from functions
pam
or diana
. optpart is deterministic from a
given initial condition. To get good results from a random start, multiple
random starts should be attempted, using function bestopt
.
Optpart is an unweighted algorithm, i.e. each of the \((n^2-n)/2\) pairwise distances or dissimilarities is included in the calculation of the ratio exactly once. Optpart somewhat penalizes small clusters, as small clusters contribute only \((n_i^2-n_i)/2\) values to the numerator; the extreme case is that a cluster with a single member does not contribute anything to the numerator.
It is an interesting characteristic of optpart that no minimum cluster size is enforced, and it is common for partitions of a large number of clusters to contain null clusters, i.e. clusters with no members. This is not a bug or error, but rather an indication that a partition with a fewer number of clusters achieves a better within/among similarity ratio than does a larger number. It is also somewhat common that for solutions with a small or intermediate number of clusters, optpart places outliers in a small ‘trash’ cluster.
When optpart is run as a fuzzy partitioning algorithm, it often achieves a surprisingly low entropy, with many items assigned completely to a single cluster.
partana
# NOT RUN {
data(shoshveg)
dis.bc <- dsvdis(shoshveg,'bray/curtis')
opt.5 <- optpart(5,dis.bc)
summary(opt.5)
# }
Run the code above in your browser using DataLab