simone: SIMoNe algorithm for network inference

Description

The simone function offers an interface to infer networks based on partial correlation coefficients in various contexts and methods (steady-state data, time-course data, multiple sample setup, clustering prior)

Usage

simone(X,
       type       = "steady-state",
       clustering = FALSE,
       tasks      = factor(rep(1, nrow(X))),
       control    = setOptions())

Arguments

a \(n\times p\) matrix of data, typically \(n\) expression levels associated to the same \(p\) genes. Can also be a data.frame with \(n\) entries, each column corresponding to a variable (a gene). Specifying colnames to X may be convenient in view of results analysis, since it will be used to annotate the plots. Note that this is the only required argument.

type

a character string indicating the data specification (either "steady-state" or "time-course" data). Default is "steady-state".

clustering

a logical indicating if the network inference should be perfomed by penalizing the edges according to a latent clustering discovered during the network structure recovery. Default is FALSE.

tasks

A factor with \(n\) entries indicating the task belonging for each observation in the multiple sample framework. Default is factor(rep(1, nrow(X))), that is, all observations come from a unique homogeneous sample.

control

A list that is used to specify low-level options for the algorithm, defined through the setOptions function.

Value

Returns an object of class simone, which is list-like and contains the following:

networks

a list with all the inferred networks stocked as adjacency matrices (the successive values of controled by the penalty level ). In the multiple sample setup, each element of the list is a list with as many entries as samples or levels in tasks.

penalties

a vector of the same length as networks, containing the successive values of the penalty level.

n.edges

a vector of the same length as networks, containing the successive numbers of edges in the inferred networks. In the multiple sample setup, n.edges is a matrix with as many columns as levels in tasks.

BIC

a vector of the same length as networks, containing the value of the BIC for the successively estimated networks.

AIC

a vector of the same length as networks, containing the value of the AIC for the successively estimated networks.

clusters

a size-\(p\) factor indicating the class of each variable.

weights

a \(p\times p\) matrix of weigths used to adapt the penalty to each entry of the Theta matrix. It is inferred through the algorithm according to the latent clustering of the network. When clustering is set to FALSE, all the weights are equal to "1", which mean no adaptive penalization.

control

a list describing all the posterior values of the parameters used by the algorithm, to compare with the one set by the setOptions function. As a matter of fact, many of the options are defined depending on the nature of the data and can be automatically corrected during internal checks of the coherence of desired options to the characteristics of the data.

Details

Any inference method available ("neighborhood selection", "graphical-Lasso", "VAR(1) inference" and "multitask learning" - see simone-package) relies on an optimization problem under the general form

where \(\mathcal{L}\) is the log-likelihood of the model (pseudo log-likelihood for "neighborhood selection") and is a penalty parameter which controls the sparsity level of the network. The \(p\times p\) matrix describes the parameters (basically, the edges) of the model, while \(\mathbf{Z}\) represents a latent clustering which is also estimated when the argument clustering is set to TRUE.

The model and the penalty function differ according to the context (steady-state/time-course data, multitask learning and its associated coupling effect). For further details on the models, please check the papers listed in the reference section of simone-package.

The criterion displayed during a SIMoNe run is the value of the penalized likelihood for the current values of the estimor corresponding to a given value of the overall penalty level .

The following information criteria are also computed for any value of

and part of the output of simone. The BIC (Bayesian Information Criterion)

and the AIC (Akaike Information Criterion)

Examples

Run this code

# NOT RUN {
## load the breast cancer data set
data(cancer)
attach(cancer)

## launch simone with the default parameters and plot results
plot(simone(expr))

# }
# NOT RUN {
## try with clustering now (clustering is achieved on a 30-edges network)
plot(simone(expr, clustering=TRUE, control=setOptions(clusters.crit=30)))

## try the multiple sample
plot(simone(expr, tasks=status))
# }
# NOT RUN {
detach(cancer)
# }

Run the code above in your browser using DataLab