sim.dat: Simulating a Microarray Data Set

Description

This function simulates a two-group comparison microarray data set according to a hierarchical model, where the standardized effect sizes across all genes are assumed to be independently and identically distributed. This distribution is a two-component mixture. It has probability \(\pi_0\) of being zero; and probability \(1-\pi_0\) of being from another distribution. The observed values are simulated independently conditional on the standardized effect sizes.

Usage

sim.dat(G = 10000, pi0 = 0.75, gamma2 = 1, n1 = 5, n2 = n1, 
        errdist = rnorm, effdist = function(g, gamma2) 
        rnorm(g, , sqrt(gamma2)), ErrArgs, EffArgs)

Arguments

a numeric positive integer, the number of genes

pi0

a numeric value between 0 and 1, the proportion of non-differentially expressed genes.

gamma2

a positive value, which is always the second argument passed to effdist. If the nonzero standardized effect sizes have a zero normal distribution, this is the variance of this distribution. The larger it is, the larger the mean absolute effects are.

a positive integer, the sample size in treatment group 1.

a positive integer, the sample size in treatment group 2.

errdist

a function, which simulate K random errors, where K is the first argument of errdist. The second argument is always ErrArgs, if it is not missing.

effdist

a function, which simulate G1 standardized effect sizes, where G1 is the first argument of effdist. The second argument is always gamma2. The third argument is always EffArgs, if it is not missing.

ErrArgs

a list of additional arguments used by errdist.

EffArgs

a list of additional arguments used by effdist.

Value

a G-by-(n1+n2) matrix.

Details

The funciton simulates \(G*N\) errors according to errdist, where \(N=n_1+n_2\). The results are organized into a G-by-N matrix. The \(G_1\) standarized effect sizes are simulated according to effdist, controlled by the parameter gamma2, where \(G_1=round(G* pi_0 \). Then, each column of the upper-left G1-by-n1 submatrix were added by the simulated effect sizes.

References

Qu, L., Nettleton, D., Dekkers, J.C.M. Subsampling Based Bias Reduction in Estimating the Proportion of Differentially Expressed Genes from Microarray Data. Unpublished manuscript.

Examples

Run this code

# NOT RUN {
set.seed(54457704)
## an unusually small data set of 20 genes and 3 samples in each of the two treatment groups. 
dat=sim.dat(G=20, n1=3,n2=3)

set.seed(9992722)
## this is how the 'simulatedDat' data set in this package generated
simulatedDat=sim.dat(G=5000)

# }

Run the code above in your browser using DataLab