bdgraph: Search algorithm in graphical models

Description

As the main function of the BDgraph package, this function consists of several sampling algorithms for Bayesian model determination in undirected graphical models. To speed up the computations, the birth-death MCMC sampling algorithms are implemented in parallel using OpenMP in C++.

Usage

bdgraph( data, n = NULL, method = "ggm", algorithm = "bdmcmc", iter = 5000, 
         burnin = iter / 2, g.start = "empty", g.space = NULL, g.prior = 0.5, 
         prior.df = 3, multi.update = NULL, save.all = FALSE, print = 1000, 
         cores = "all" )

Arguments

data

There are two options: (1) an (\(n \times p\)) matrix or a data.frame corresponding to the data, (2) an (\(p \times p\)) covariance matrix as \(S=X'X\) which \(X\) is the data matrix (\(n\) is the sample size and \(p\) is the number of variables). It also could be an object of class "sim", from function bdgraph.sim. The input matrix is automatically identified by checking the symmetry.

The number of observations. It is needed if the "data" is a covariance matrix.

method

A character with two options "ggm" (default) and "gcgm". Option "ggm" is for Gaussian graphical models based on Gaussianity assumption. Option "gcgm" is for Gaussian copula graphical models for the data that not follow Gaussianity assumption (e.g. continuous non-Gaussian, discrete, or mixed dataset).

algorithm

A character with two options "bdmcmc" (default) and "rjmcmc". Option "bdmcmc" is based on birth-death MCMC algorithm. Option "rjmcmc" is based on reverible jump MCMC algorithm.

iter

The number of iteration for the sampling algorithm.

burnin

The number of burn-in iteration for the sampling algorithm.

g.start

Corresponds to a starting point of the graph. It could be "empty" (default) and "full". Option "empty" means the initial graph is an empty graph and "full" means a full graph. It also could be an object with S3 class "bdgraph"; with this option we could run the sampling algorithm from the last objects of previous run (see examples).

g.space

Corresponds to the sub-space of the graph. For the case g.start = "NULL" (default), algorithm search in the hole graph space. With this option, one could run the search algorithm only to the restricted subspace of the graph. Subspace should determine as an adjacency matrix.

g.prior

For determining the prior distribution of each edge in the graph. There are two options: a single value between \(0\) and \(1\) (e.g. \(0.5\) as a noninformative prior) or an (\(p \times p\)) matrix with elements between \(0\) and \(1\).

prior.df

The degree of freedom for G-Wishart distribution, \(W_G(b,D)\), which is a prior distribution of the precision matrix.

multi.update

It is only for the BDMCMC algorithm (algorithm = "bdmcmc"). It is for simultaneously updating multiple links at the same time to update graph in the BDMCMC algorithm.

save.all

Logical: if FALSE (default), the adjacency matrices are NOT saved. If TRUE, the adjacency matrices after burn-in are saved.

Value to see the number of iteration for the MCMC algorithm.

cores

The number of cores to use for parallel execution. The default is to use "all" CPU cores of the computer; it can also be a number, e.g. cores=2 means 2 CPU cores to use for parallel execution.

Value

An object with S3 class "bdgraph" is returned:

p_links

An upper triangular matrix which corresponds the estimated posterior probabilities of all possible links.

K_hat

The posterior estimation of the precision matrix.

For the case "save.all = TRUE" is returned:

sample_graphs

A vector of strings which includes the adjacency matrices of visited graphs after burn-in.

graph_weights

A vector which includes the waiting times of visited graphs after burn-in.

all_graphs

A vector which includes the identity of the adjacency matrices for all iterations after burn-in. It is needed for monitoring the convergence of the BD-MCMC algorithm.

all_weights

A vector which includes the waiting times for all iterations after burn-in. It is needed for monitoring the convergence of the BD-MCMC algorithm.

References

Mohammadi, A. and E. Wit (2015). Bayesian Structure Learning in Sparse Gaussian Graphical Models, Bayesian Analysis, 10(1):109-138

Mohammadi, A. and E. Wit (2015). BDgraph: An R Package for Bayesian Structure Learning in Graphical Models, arXiv preprint arXiv:1501.05108

Mohammadi, A. et al (2017). Bayesian modelling of Dupuytren disease by using Gaussian copula graphical models, Journal of the Royal Statistical Society: Series C

Mohammadi, A., Massam H., and G. Letac (2017). The ratio of normalizing constants for Bayesian graphical Gaussian model selection, arXiv preprint arXiv:1706.04416

Examples

Run this code

# NOT RUN {
# Generating multivariate normal data from a 'random' graph
data.sim <- bdgraph.sim( n = 20, p = 6, size = 7, vis = TRUE )
   
bdgraph.obj <- bdgraph( data = data.sim, iter = 1000 )
  
summary( bdgraph.obj )
   
# To compare our result with true graph
compare( data.sim, bdgraph.obj, colnames = c("True graph", "BDgraph") )
   
# Running algorithm with starting points from previous run
bdgraph.obj2 <- bdgraph( data = data.sim, iter = 5000, g.start = bdgraph.obj )
    
compare( data.sim, bdgraph.obj, bdgraph.obj2, 
         colnames = c( "True graph", "Frist run", "Second run" ) )
   
# Generating mixed data from a 'scale-free' graph
data.sim <- bdgraph.sim( n = 50, p = 6, type = "mixed", graph = "scale-free", vis = TRUE )
   
bdgraph.obj <- bdgraph( data = data.sim, method = "gcgm", iter = 10000 )
  
summary( bdgraph.obj )
   
compare( data.sim, bdgraph.obj )	  
# }

Run the code above in your browser using DataLab