TranSAM: Gene Expression State Transformed Significance Analysis of Microarrays

Description

Implements TranSAM, a method to use the GESTr-transformed representation of gene expression data to identify genes with _biologically_ significant variation: that is, statistically significant differential expression across biologically distinct states of expression observed in the compendium used for reference (calculating the GESTr models).

Usage

TranSAM(x,samples1,samples2,minChange=0.2,var_filter=0.01,maxFDR=1,changeStep=0.1,scoreFun="magChange")

Arguments

Numeric data array of transformed gene expression 'state' values. Output from GESTr function. Probes/genes should be in rows, samples/conditions in columns.

samples1

Numeric vector of indices for first group of samples

samples2

Numeric vector of indices for second group of samples

minChange

Numeric value specifying minimum value of the observed difference between the groups, compared to the expected difference as estimated from the balanced permutations

var_filter

Numeric value specifying a minimum standard deviation in gene expression state values for a gene to be included in the analysis

maxFDR

Numeric value specifying maximum allowed group-wise False Discovery Rate, the function will iterate over successively greater minimum observed differences until estimated FDR is below maxFDR

changeStep

Numeric value specifying step-wise increase of minChange filter at each iteration

scoreFun

Character specifying method of scoring. "dstat" uses a regularized t-statistic, making it an analogue of the Significance Analysis of Microarrays (SAM) approach. Any other value uses the absolute difference between the median expression state value of the gene in question across the two groups.

Value

genes: The rownames of input x corresponding to the genes with significant differential expression between the specified group
obs.exp.ratios: The calculated scoring statistic for differential expression (in terms of observed value compared to expected)
change: The difference in median gene expression state values for the gene across the two groups
FDR.estimate: Estimated Family-Wise Error Rate (FWER) across all genes at least as differentially expressed as the selected gene. This is analogous to FDRor the q-value

Details

The TranSAM algorithm constructs balanced permutations of the input data and uses these to estimate the false-discovery rates of identifying genes as belonging to different expression states in the two specified sample groups. The balanced permutations are constructed so that an equal number of samples from each specified group are in each partition, and thus can be used to approximate a distribution of expected variation in gene expression state across the groups if the specified grouping were to have no biological relevance (in terms of gene expression profiles).

Examples

Run this code

## load data and run GESTr on a subset of this to create transformed data
data(GESTr)
selected.columns <- sort(c(sample(1:ncol(ABIdata),30),which(colnames(ABIdata) %in% c("GSM194513","GSM194514","GSM194515","GSM194516","GSM194517","GSM194518"))))
transformed.x <- GESTr(ABIdata[1:20,selected.columns])

## choose samples for analysis
thy.adult <- which(colnames(transformed.x) %in% c("GSM194513","GSM194514","GSM194515"))
thy.fetal <- which(colnames(transformed.x) %in% c("GSM194516","GSM194517","GSM194518"))

## run TranSAM on selected samples
ts.out <- TranSAM(transformed.x[,c(thy.adult,thy.fetal)],samples1=1:3,samples2=4:6)

Run the code above in your browser using DataLab