mockRNASeqData: A Simulated RNA-Seq Data Set

Description

This is a simulated RNA-Seq data set using a negative binomial model with 10000 genes and 8 experimental unit, under a balanced two-treatment comparison design.

Usage

mockRNASeqData

Arguments

Format

This is a list with the following components:

counts: This is a numeric data matrix with 10000 rows and 8 columns, containing counts for each gene (row) and each experimental unit (column).
treatment: This is a factor with 2 levels, indicating the treatment group of each column of counts.
design.matrix: This is an example of design matrix corresponding to treatment.
true.normalization: This is a numeric vector of normalizing factors actually used to simulate the data matrix.
estimated.normalization: This is a numeric vector of normalizing factors estimated from the data matrix, using the so-called "TMM" method.
true.nbdisp: This is a numeric vector of negative binomial over-dispersion parameters actually used to simulate the data. This is using the parameterization such that true.nbdisp = 1/size, where size is the parameter used in rnbinom.
estimated.nbdisp: This is a numeric vector of estimated negative binomial over-dispersion parameters, using the "TrendedDisp" method from the edgeR package.
ngenes: Integer scalar 10000, the number of rows of counts.
nsamples: Integer scalar 8, the number of columns of counts.
true.DEgenes: An integer vector of length 3500, indicating the correct row indices of differentially expressed genes, i.e., rows whose means differ across the two treatments.
true.foldChanges: A numeric vector of length 3500, indicating the true ratio of means for each differentially expressed genes.
simulation.expression: This is a R expression that was used to simulate the mockRNASeqData data set itself. eval(mockRNASeqData$simulation.expression) should generate an identical data set, except for the simulation.expression component itself.