For one specific cluster \(g\) and \(R\) LDA Runs the disparity is calculated by
$$U(g) := \frac{1}{R} \sum_{r=1}^R \vert t_r^{(g)} - 1 \vert \cdot \sum_{r=1}^R t_r^{(g)},$$
while \(\bm t^{(g)} = (t_1^{(g)}, ..., t_R^{(g)})^T\)
contains the number of topics that belong to the different LDA runs and that
occur in cluster \(g\).
The function disparitySum
returns the least possible sum of disparities
\(U_{\Sigma}(G^*)\) for the best possible pruning state \(G^*\)
with \(U_{\Sigma}(G) = \sum_{g \in G} U(g) \to \min\).
The highest possible value for \(U_{\Sigma}(G^*)\) is limited by
$$U_{\Sigma,\textsf{max}} := \sum_{g \in \tilde{G}} U(g) = N \cdot \frac{R-1}{R},$$
with \(\tilde{G}\) denotes the corresponding worst case pruning state. This worst
case scenario is useful for normalizing the SCLOP scores.
The function SCLOP
then calculates the value
$$\textsf{S-CLOP}(G^*) := 1 - \frac{1}{U_{\Sigma,\textsf{max}}} \cdot \sum_{g \in G^*} U(g) ~\in [0,1],$$
where \(\sum\limits_{g \in G^*} U(g) = U_{\Sigma}(G^*)\).