Learn R Programming

GMCM (version 1.4)

GMCM-package: Fast optimization of Gaussian Mixture Copula Models

Description

Gaussian mixture copula models (GMCM) are a flexible class of statistical models which can be used for unsupervised clustering, meta analysis, and many other things. In meta analysis, GMCMs can be used to quantify and identify which features which have been reproduced across multiple experiments. This package provides a fast and general implementation of GMCM cluster analysis and serves as an improvement and extension of the features available in the idr package.

Arguments

Details

If the meta analysis of Li et al. (2011) is to be performed, the function fit.meta.GMCM is used to identify the maximum likelihood estimate of the special Gaussian mixture copula model (GMCM) defined by Li et al. (2011). The function get.IDR computes the local and adjusted Irreproducible Discovery Rates defined by Li et al. (2011) to determine the level of reproducibility.

Tewari et. al. (2011) proposed using GMCMs as an general unsupervised clustering tool. If such a general unsupervised clustering is needed, like above, the function fit.full.GMCM computes the maximum likelihood estimate of the general GMCM. The function get.prob is used to estimate the class membership probabilities of each observation.

SimulateGMCMData provide easy simulation from the GMCMs.

References

Anders Ellern Bilgrau, Poul Svante Eriksen, Jakob Gulddahl Rasmussen, Hans Erik Johnsen, Karen Dybkaer, Martin Boegsted (2016). GMCM: Unsupervised Clustering and Meta-Analysis Using Gaussian Mixture Copula Models. Journal of Statistical Software, 70(2), 1-23. doi:10.18637/jss.v070.i02

Li, Q., Brown, J. B. J. B., Huang, H., & Bickel, P. J. (2011). Measuring reproducibility of high-throughput experiments. The Annals of Applied Statistics, 5(3), 1752-1779. doi:10.1214/11-AOAS466

Tewari, A., Giering, M. J., & Raghunathan, A. (2011). Parametric Characterization of Multimodal Distributions with Non-gaussian Modes. 2011 IEEE 11th International Conference on Data Mining Workshops, 286-292. doi:10.1109/ICDMW.2011.135

See Also

Core user functions: fit.meta.GMCM, fit.full.GMCM, get.IDR, get.prob, SimulateGMCMData, SimulateGMMData, rtheta, Uhat, choose.theta, full2meta, meta2full

Package by Li et. al. (2011): idr.

Examples

Run this code
# NOT RUN {
# Loading data
data(u133VsExon)

# Subsetting data to reduce computation time
u133VsExon <- u133VsExon[1:5000, ]

# Ranking and scaling,
# Remember large values should be critical to the null!
uhat <- Uhat(1 - u133VsExon)

# Visualizing P-values and the ranked and scaled P-values
# }
# NOT RUN {
par(mfrow = c(1,2))
plot(u133VsExon, cex = 0.5, pch = 4, col = "tomato", main = "P-values",
     xlab = "P   (U133)", ylab = "P   (Exon)")
plot(uhat, cex = 0.5, pch = 4, col = "tomato", main = "Ranked P-values",
     xlab = "rank(1-P)   (U133)", ylab = "rank(1-P)   (Exon)")
# }
# NOT RUN {
# Fitting using BFGS
fit <- fit.meta.GMCM(uhat, init.par = c(0.5, 1, 1, 0.5), pgtol = 1e-2,
                     method = "L-BFGS", positive.rho = TRUE, verbose = TRUE)

# Compute IDR values and classify
idr <- get.IDR(uhat, par = fit)
table(idr$K) # 1 = irreproducible, 2 = reproducible

# }
# NOT RUN {
# See clustering results
par(mfrow = c(1,2))
plot(u133VsExon, cex = 0.5, pch = 4, main = "Classified genes",
     col = c("tomato", "steelblue")[idr$K],
     xlab = "P-value (U133)", ylab = "P-value (Exon)")
plot(uhat, cex = 0.5, pch = 4, main = "Classified genes",
     col = c("tomato", "steelblue")[idr$K],
     xlab = "rank(1-P) (U133)", ylab = "rank(1-P) (Exon)")
# }

Run the code above in your browser using DataLab