Learn R Programming

simone (version 1.0-4)

rTranscriptData: Simulation of artificial transcriptomic data

Description

Simulates a Gaussian sample that mimics transcriptomic data, according to a given network, either steady-state or time-course data. When several networks are given, multiple samples are generated.

Usage

rTranscriptData(n,
                graph,
                ...,
                mu    = rep(0, p),
                sigma = 0.1)

Arguments

n

integer or vector of integer indicating the sample sizes of each task

graph

a simone.network object typically generated either by rNetwork or coNetwork

additional simone.network objects in case of multiple sample generation

mu

if the network(s) is(are) directed, mu is the offset of the VAR(1) model that is used to generate the time-course data; if undirected, mu is the offset of the Gaussian vector.

sigma

standard deviation of the noise term used in the simulation process

Value

Returns a list comprising :

X

matrix of simulated gene expression data, n observations in rows, genes in columns

tasks

factor indicating the tasks corresponding to the simulated gene expression data in case of multiple networks.

Details

If the network is directed, time-course data are simulated according to a VAR(1) model. If the network is undirected, steady-state data are generated by simulating an independent, identically distributed sample of a Gaussian vector.

In both cases, samples are generated on the basis of , as provided by graph$Theta.

If the network is directed, samples are generated according to the following VAR(1) process:

If the network is undirected, samples are generated according to the following Gaussian vector:

Numerically, is computed with the Cholesky decomposition of the pseudo-inverse of .

See Also

rNetwork, coNetwork.

Examples

Run this code
# NOT RUN {
## time-Course data generation
##-----------------------------
# generate a directed network
n <- 20
p <- 5
g <- rNetwork(p, pi=5, directed=TRUE)
# Generate the data, data2 noisier than data1
data1  <- rTranscriptData(n,g)
data2  <- rTranscriptData(n,g,sigma=1)
matplot(1:n, data1$X,type= "l", xlab = "time points",
        ylab = "level of expression", col=rainbow(n,start=2/6,end = 3/6),
        ylim = range(c(data1$X,data2$X)),
        main="data2 (blue) generated with more noise than data1 (green)")
matlines(1:n,data2$X,type= "l",col = rainbow(n,start=4/6,end=5/6))

## steady-state data generation
##-----------------------------
# generate an undirected network
p <- 10
g <- rNetwork(p, pi=10)
data <- rTranscriptData(n=1000,g, sigma=0)
attach(data)
# Inference of Theta (here without dimension problems since p << n)
b <- sapply(1:p,function(x){
   tmp <- -solve(t(X[,-x]) %*% X[,-x]) %*% t(X[,-x]) %*% X[,x]
   res <- rep(NA,10)
   res[-x] <- tmp
   res[x] <- 1
   return(res)
  }
)
detach(data)
# comparison of theoretical Theta and inferred Theta
par(mfrow=c(1,2))
image(g$Theta, main = "Theoretical Theta")
image(b, main = "Inferred Theta")

## time-course multitask data generation
##--------------------------------------
# start by generating the networks
ancestor <- rNetwork(p=5, pi=5, name="ancestor", directed=TRUE)
child1   <- coNetwork(ancestor, 1, name = "child 1")
child2   <- coNetwork(ancestor, 1, name = "child 2")
# generate the data
n <- c(20,20)
data  <- rTranscriptData(n,child1,child2)
attach(data)
par(mfrow=c(2,1))
matplot(1:(n[1]),X[tasks ==1,],type= "l", main="Dataset from child 1",
        xlab = "time points", ylab = "level of expression")
matplot(1:(n[2]),X[tasks == 2,], type= "l", main="Dataset from child 2",
        xlab = "time points", ylab = "level of expression")
detach(data)
# }

Run the code above in your browser using DataLab