simone-package: Statistical Inference for MOdular NEtworks (SIMoNe)

Description

The R package simone implements the inference of co-expression networks based on partial correlation coefficients from either steady-state or time-course transcriptomic data. Note that with both type of data this package can deal with samples collected in different experimental conditions and therefore not identically distributed. In this particular case, multiple but related graphs are inferred at once.

The underlying statistical tools enter the framework of Gaussian graphical models (GGM). Basically, the algorithm searches for a latent clustering of the network to drive the selection of edges through an adaptive \(\ell_1\)-penalization of the model likelihood.

The available inference methods for edges selection and/or estimation include

neighborhood selection: as in Meinshausen and Buhlman (2006), steady-state data only;
graphical Lasso: as in Banerjee et al, 2008 and Friedman et al (2008), steady-state data only;
VAR(1) inference: as in Charbonnier, Chiquet and Ambroise (2010), time-course data only;
multitask learning: as in Chiquet, Grandvalet and Ambroise (preprint), both time-course and steady-state data.

All the listed methods are \(\ell_1\)-norm based penalization, with an additional grouping effect for multitask learning (including three variants: "intertwined", "group-Lasso" and "cooperative-Lasso").

The penalization of each individual edge may be weighted according to a latent clustering of the network, thus adapting the inference of the network to a particular topology. The clustering algorithm is performed by the mixer package, based upon Daudin, Picard and Robin (2008)'s Mixture Model for Random Graphs.

Since the choice of the network sparsity level remains a current issue in the framework of sparse Gaussian network inference, the algorithm provides a full path of estimates starting from an empty network and adding edges as the penalty level progressively decreases. Bayesian Information Criteria (BIC) and Akaike Information Criteria (AIC) are adapted to the GGM context in order to help to choose one particular network among this path of solutions.

Graphical tools are provided to summarize the results of a simone run and offer various representations for network plotting.

Arguments

Demos available

Beyond the examples of this manual, a good starting point is to have a look at the scripts available via demo(package="simone"). They make use of simone, main function in the package, in various contexts (steady-state or time-course data, multiple sample learning). All these scripts also illustrate the use of the different plot functions.

demo(cancer_multitask): example on the cancer data set of the multitask approach with a cooperative-Lasso grouping effect across tasks. Patient responses to the chemiotherapy (pCR or not-pCR) split the data set into two distinct samples. Network inference is performed jointly on these samples and graphical comparison is made between the two networks.
demo(cancer_pooled): example on the cancer data set which is designed to compare network inference when a clustering prior is used or not. Graphical comparison between the two inferred networks (with/without clustering prior) illustrates how inference is driven to a particular network topology when clustering is relevant (here, an affiliation structure).
demo(check_glasso, echo=FALSE): example that basically checks the consistency between the glasso package of Friedman et al and the simone package to solve the \(\ell_1\)-penalized Gaussian likelihood criterion suggested by Banerjee et al in the \(n>p\) settings. In the \(n<p\) settings, simone provides sparser solutions than the glasso package since the underlying Lasso problems are solved with an active set algorithm instead of the shooting/pathwise coordinate algorithm.
demo(simone_multitask): example of multitask learning on simulated, steady-state data: two networks are generated by randomly perturbing a common ancestor with the coNetwork function. These two networks are then used to generate two multivariate Gaussian samples. Multitask learning is applied and a simple illustration of the use of the setOptions function is given.
demo(simone_steadyState): example of how to learn a single network from steady-state data. A sample is first generated with the rNetwork and rTranscriptData functions. Then the path of solutions of the neighborhood selection method (default for single task steady-state data) is computed.
demo(simone_timeCourse): example of how to learn a single network from time-course data. A sample is first generated with the rNetwork and rTranscriptData functions and the path of solutions of the VAR(1) inference method is computed, with and without clustering prior.

Details

Index:

cancer               Microarray data set for breast cancer
coNetwork            Random perturbations of a reference network
getNetwork           Network extraction from a SIMoNe run
plot.simone          Graphical representation of SIMoNe outputs
plot.simone.network  Graphical representation of a network 
rNetwork             Simulation of (clustered) Gaussian networks
rTranscriptData      Simulation of artificial transcriptomic data 
setOptions           Low-level options of the 'simone' function
simone               SIMoNe algorithm for network inference

References

J. Chiquet, Y. Grandvalet, and C. Ambroise (preprint). Inferring multiple graphical structures. preprint available on ArXiv. http://arxiv.org/abs/0912.4434.

C. Charbonnier, J. Chiquet, and C. Ambroise (2010). Weighted-Lasso for Structured Network Inference from Time Course Data. Statistical Applications in Genetics and Molecular Biology, vol. 9, iss. 1, article 15. http://www.bepress.com/sagmb/vol9/iss1/art15/

C. Ambroise, J. Chiquet, and C. Matias (2009). Inferring sparse Gaussian graphical models with latent structure. Electronic Journal of Statistics, vol. 3, pp. 205--238. http://dx.doi.org/10.1214/08-EJS314

O. Banerjee, L. El Ghaoui, A. d'Aspremont (2008). Model Selection Through Sparse Maximum Likelihood Estimation. Journal of Machine Learning Research, vol. 9, pp. 485--516. http://www.jmlr.org/papers/volume9/banerjee08a/banerjee08a.pdf

J. Friedman, T. Hastie and R. Tibshirani (2008). Sparse inverse covariance estimation with the graphical Lasso. Biostatistics, vol. 9(3), pp. 432--441. http://www-stat.stanford.edu/~tibs/ftp/graph.pdf

N. Meinshausen and P. Buhlmann (2006). High-dimensional graphs and variable selection with the Lasso. The Annals of Statistics, vol. 34(3), pp. 1436--1462. http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?view=body&id=pdfview_1&handle=euclid.aos/1152540754

J.-J. Daudin, F.Picard and S. Robin, S. (2008). Mixture model for random graphs. Statistics and Computing, vol. 18(2), pp. 173--183. http://www.springerlink.com/content/9v6846342mu82x42/fulltext.pdf