relabel: Stephens' Relabelling Algorithm for Clusterings

Description

For a sample of clusterings in which corresponding clusters have different labels the algorithm attempts to bring the clusterings to a unique labelling.

Usage

relabel(cls, print.loss = TRUE)

Arguments

cls

a matrix in which every row corresponds to a clustering of the ncol(cls) objects.

print.loss

logical, should current value of loss function be printed after each iteration? Defaults to TRUE.

Value

cls

the input cls with unified labelling.

an $n \times K$ matrix, where entry $[i,j]$ contains the estimated probability that observation $i$ belongs to cluster $j$.

loss.val

value of the loss function.

vector of cluster memberships that have the highest probabilities $\hat{p}_{ij}$.

Warning

The algorithm assumes that the number of clusters $K$ is fixed. If this is not the case $K$ is taken to be the most common number of clusters. Clusterings with other numbers of clusters are discarded and a warning is issued.

Details

The algorithm minimizes the loss function $$\sum_{m=1}^M\sum_{i=1}^n\sum_{j=1}^K-\log\hat{p}_{ij} \cdot I_{\{z_i^{(m)}=j\}}$$ over the $M$ clusterings, $n$ observations and $K$ clusters, where $\hat{p}_{ij}$ is the estimated probability that observation $i$ belongs to cluster $j$ and $z_i^{(m)}$ indicates to which cluster observation $i$ belongs in clustering $m$. $I_{\{.\}}$ is an indicator function.

Minimization is achieved by iterating the estimation of $\hat{p}_{ij}$ over all clusterings and the minimization of the loss function in each clustering by permuting the cluster labels. The latter is done by linear programming.

References

Stephens, M. (2000) Dealing with label switching in mixture models. Journal of the Royal Statistical Society Series B, 62, 795--809.

Examples

Run this code

# NOT RUN {
(cls <- rbind(c(1,1,2,2),c(1,1,2,2),c(1,2,2,2),c(2,2,1,1)))
# group 2 in clustering 4 corresponds to group 1 in clustering 1-3.
cls.relab <- relabel(cls)
cls.relab$cls
# }

Run the code above in your browser using DataLab