generate_gCMAP_NChannelSet: Generate a perturbation profile library from expression sets of control/treatment pairs

Description

When provided with a list of ExpressionSet or countDataSet objects, comparisons are made between control and perturbation samples on a set basis. To process RNAseq count data, the suggested Bioconductor package 'DESeq' must be available on the system. For countDataSets, a moderated log2 fold change for each set is calculated after variance-stabilizing transformation of the count data is performed globally across all countDataSets in the list.

Usage

generate_gCMAP_NChannelSet( data.list, uids=1:length(data.list), sample.annotation=NULL, platform.annotation="", control_perturb_col="cmap", control="control", perturb="perturbation", limma=TRUE, limma.index=2, big.matrix=NULL, center.z="peak", center.log_fc="none", report.center=FALSE )

Arguments

data.list

List of ExpressionSet or CountDataSet objects. Each element includes all array / RNAseq data for a single instance, plus metadata on which samples are perturbation and control.

uids

Vector of unique identifiers for the instances in data.list

sample.annotation

An optional data.frame of additional annotation for instances, each row corresponds to one instance, ordered to correspond with the data.list. This is not used for the control/perturbation comparisons, instead it is simply attached to the NChannelSet for future reference.

platform.annotation

The name of the platform as used by the annotation package.

control_perturb_col

See pairwise_compare.

control

See pairwise_compare.

perturb

See pairwise_compare.

limma

Use limma package to perform moderated t-tests (Default: TRUE) instead of a standard t-test ?

limma.index

Integer specifying the index of the parameter estimate for which we to extract t and other statistics. The default corresponds to a two-class comparison with the standard parameterization. The function assumes that there was no missing data, so that test for all genes were performed on the same sample size.

big.matrix

Character string providing the path and filename to store the NChannelSets on disk instead of in memory. If 'NULL' (default), an NChannelSet is returned. If not 'NULL', the bigmemoryExtras package will create (or overwrite !) three binary files for each channel of the NChannelSet at the location provided as 'big.matrix', distinguishing files for the different channes by their suffices. To load the NChannelSet into a different R session, the binary files must be accessible.

center.z

One of 'none', 'peak', 'mean', 'median', selecting whether / how to center the z-scores for each experiment. Option 'peak' (default) will center on the peak of the z-score kernel density. Options 'mean' and 'median' will center on their respective values instead.

center.log_fc

One of 'none', 'peak', 'mean', 'median', selecting whether / how to center the log2 fold-change distribution for each experiment. Option 'peak' will center on the peak of the z-score kernel density. Options 'mean' and 'median' will center on their respective values instead.

report.center

Logical, include the z-score / log2 fold change corrections and the median absolute deviation of the respective distribution about zero in the pData slot of the returned NChannelSet ?

Value

The function returns an NChannelSet with one channel for each of the columns returned by pairwise_compare. This can be worked with directly (e.g, assayData(obj)$z), or specific channels can be converted to regular ExpressionSet objects (e.g.,es <- channel(obj, "z")). In the latter case, one would access z by exprs(es). If 'report.center' is TRUE, the pData slot of the NChannelSet contains columns reporting the shift applied to the z-score and / or log2 fold change columns to center the score distributions on zero and the median absolute deviation of the shifted distribution about zero.

Examples

Run this code

## list of ExpressionSets
data("sample.ExpressionSet") ## from Biobase

es.list <- list( sample.ExpressionSet[,1:4],
                 sample.ExpressionSet[,5:8],
                 sample.ExpressionSet[,9:12])
names(es.list) <- paste( "Instance", 1:3, sep=".")

de <- generate_gCMAP_NChannelSet(
    es.list,
    1:3,
    platform.annotation = annotation(es.list[[1]]),
    control_perturb_col="type",
    control="Control",
    perturb="Case") 

assayDataElementNames(de)
head( assayDataElement(de, "z") ) 

## Not run: 
# ## processing RNAseq data requires the suggested 'DESeq' 
# ## Bioconductor package.
# require( DESeq )
# set.seed( 123 )
# ## list of CountDataSets
# cds.list <- lapply( 1:3, function(n) {
#    cds <- makeExampleCountDataSet()
#    featureNames(cds) <- paste("gene",1:10000, sep="_")
#    cds
# })
# 
# cde <- generate_gCMAP_NChannelSet(cds.list,
#                            uids=1:3,
#                            sample.annotation=NULL,
#                            platform.annotation="Entrez",
#                            control_perturb_col="condition",
#                            control="A",
#                            perturb="B")
# 
# assayDataElementNames(cde)
# ## End(Not run)

Run the code above in your browser using DataLab