Implements the data generation from multivariate Zero-Inflated Negative Binomial (ZINB) distributions with different graph structures, including "random"
, "hub"
, "cluster"
, "AR(2)"
and "scale-free"
.
ContSim(n, p, v = NULL, u = NULL, g = NULL, prob = NULL, vis = FALSE, verbose = TRUE,
graph.type="AR(2)", k=3.30, lambda=515, omega=0.003,lower.tail = TRUE, log.p = FALSE)
The number of observations (sample size).
The number of variables (dimension).
The graph structure with 4 options: "random"
, "hub"
, "cluster"
, "AR(2)"
and "scale-free"
.
The off-diagonal elements of the precision matrix, controlling the magnitude of partial correlations with u
. The default value is 0.3.
A positive number being added to the diagonal elements of the precision matrix, to control the magnitude of partial correlations. The default value is 0.1.
For "cluster"
or "hub"
graph, g
is the number of hubs or clusters in the graph. The default value is about d/20
if d >= 40
and 2 if d < 40
. NOT applicable to "random"
and "AR(2)"
graph.
For "random"
graph, it is the probability that a pair of nodes has an edge. The default value is 3/d
. For "cluster"
graph, it is the probability that a pair of nodes has an edge in each cluster. The default value is 6*g/d
if d/g <= 30
and 0.3 if d/g > 30
. NOT applicable to "hub"
or "AR(2)"
graphs.
Visualize the adjacency matrix of the true graph structure, the graph pattern, the covariance matrix and the empirical covariance matrix. The default value is FALSE
If verbose = FALSE
, tracing information printing is disabled. The default value is TRUE
.
dispersion parameter of ZINB distribution, default of 3.30.
vector of (non-negative) means of ZINB distribution, default of 515.
zero-inflation parameter of ZINB distribution, default of 0.003.
logical; if TRUE (default), probabilities are P[X <= x]
, otherwise, P[X> x]
.
logical; if TRUE, probabilities p
are given as log(p)
.
A list of two elements:
The simulated count dataset in a \(n\)x\(p\) matrix.
\(p\)x\(p\) The adjacency matrix of true graph structure (in sparse matrix representation) for the generated data
This is the function that can generate dataset from multivariate Zero-Inflated Negative Binomial distributions with different graph structures, including "random"
, "hub"
, "cluster"
, "AR(2)"
and "scale-free"
.
Given the adjacency matrix theta
, the graph patterns are generated as below:
(I) random
: Each pair of off-diagonal elements are randomly set theta[i,j]=theta[j,i]=1
for i!=j
with probability prob
, and 0
other wise. It results in about d*(d-1)*prob/2
edges in the graph.
(II)hub
:The row/columns are evenly partitioned into g
disjoint groups. Each group is associated with a "center" row i
in that group. Each pair of off-diagonal elements are set theta[i,j]=theta[j,i]=1
for i!=j
if j
also belongs to the same group as i
and 0
otherwise. It results in d - g
edges in the graph.
(III)cluster
:The row/columns are evenly partitioned into g
disjoint groups. Each pair of off-diagonal elements are set theta[i,j]=theta[j,i]=1
for i!=j
with the probability prob
if both i
and j
belong to the same group, and 0
other wise. It results in about g*(d/g)*(d/g-1)*prob/2
edges in the graph.
(IV)AR(2)
: The off-diagonal elements are set to be theta[i,j]=0.5
if |i-j|=1
, theta[i,j]=0.05
if |i-j|=2
and 0
other wise.
(V) scale-free
: The graph is generated using B-A algorithm. The initial graph has two connected nodes and each new node is connected to only one node in the existing graph with the probability proportional to the degree of the each node in the existing graph. It results in d
edges in the graph.
The adjacency matrix theta
has all diagonal elements equal to 0
. To obtain a positive definite precision matrix, the smallest eigenvalue of theta*v
(denoted by e
) is computed. Then we set the precision matrix equal to theta*v+(|e|+0.1+u)I
. The covariance matrix is then computed for generating multivariate ZINB dataset.
The default values for parameters k
, lambda
and omega
of ZINB distribution are estimated from a real TCGA dataset. See Jia.B et al(2017) for more detail.
Jia, B., Xu, S., Xiao, G., Lamba, V., Liang, F. (2017) Inference of Genetic Networks from Next Generation Sequencing Data. Biometrics.
T. Zhao and H. Liu.(2012) The huge Package for High-dimensional Undirected Graph Estimation in R. Journal of Machine Learning Research.
Yahav, I., and Shmueli, G. (2012). On generating multivariate Poisson data in management science applications. Applied Stochastic Models in Business and Industry, 28(1), 91-102.
# NOT RUN {
library(equSA)
ContSim(100,200)
# }
Run the code above in your browser using DataLab