dlso: Draws-Based Latent Structure Optimization

Description

This function provides a point estimate for a partition distribution using the draws latent structure optimization (DLSO) method, which is also known as the least-squares clustering method (Dahl 2006). The method seeks to minimize the expectation of the Binder loss or the lower bound of the expectation of the variation of information loss by picking the minimizer among the partitions supplied by the draws argument.

Usage

dlso(psm, loss = c("VI.lb", "binder")[1], draws, parallel = TRUE)

Arguments

psm

A pairwise similarity matrix, i.e., n-by-n symmetric matrix whose (i,j) element gives the (estimated) probability that items i and j are in the same subset (i.e., cluster) of a partition (i.e., clustering).

loss

Either "VI.lb" or "binder", to indicate the desired loss function.

draws

A B-by-n matrix, where each of the B rows represents a clustering of n items using cluster labels. For clustering b, items i and j are in the same cluster if x[b,i] == x[b,j].

parallel

Should the search use all CPU cores?

Value

A list of the following elements:

estimate: An integer vector giving a partition encoded using cluster labels.
loss: A character vector equal to the loss argument.
expectedLoss: A numeric vector of length one giving the expected loss.

References

D. A. Binder (1978), Bayesian cluster analysis, Biometrika 65, 31-38.

D. B. Dahl (2006), Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model, in Bayesian Inference for Gene Expression and Proteomics, Kim-Anh Do, Peter M<U+00FC>ller, Marina Vannucci (Eds.), Cambridge University Press.

J. W. Lau and P. J. Green (2007), Bayesian model based clustering procedures, Journal of Computational and Graphical Statistics 16, 526-558. D. B. Dahl, M. A. Newton (2007), Multiple Hypothesis Testing by Clustering Treatment Effects, Journal of the American Statistical Association, 102, 517-526.

A. Fritsch and K. Ickstadt (2009), An improved criterion for clustering based on the posterior similarity matrix, Bayesian Analysis, 4, 367-391.

S. Wade and Z. Ghahramani (2018), Bayesian cluster analysis: Point estimation and credible balls. Bayesian Analysis, 13:2, 559-626.

Examples

Run this code

# NOT RUN {
dlso(draws=iris.clusterings, parallel=FALSE)

# }

Run the code above in your browser using DataLab