CombinatorialSignificance: Compute the Statistical Significance of Each Replicate Combination

Description

In case a PhyloExpressionSet or DivergenceExpressionSet stores replicates for each developmental stage or experiment, this function allows to compute the p-values quantifying the statistical significance of the underlying pattern for all combinations of replicates.

Usage

CombinatorialSignificance(
  ExpressionSet,
  replicates,
  TestStatistic = "FlatLineTest",
  permutations = 1000,
  parallel = FALSE
)

Arguments

ExpressionSet

a standard PhyloExpressionSet or DivergenceExpressionSet object.

replicates

a numeric vector storing the number of replicates within each developmental stage or experiment. In case replicate stores only one value, then the function assumes that each developmental stage or experiment stores the same number of replicates.

TestStatistic

a string defining the type of test statistics to be used to quantify the statistical significance the present phylotranscriptomics pattern. Default is TestStatistic = "FlatLineTest".

permutations

a numeric value specifying the number of permutations to be performed for the FlatLineTest.

parallel

a boolean value specifying whether parallel processing (multicore processing) shall be performed.

Value

a numeric vector storing the p-values returned by the underlying test statistic for all possible replicate combinations.

Details

The intention of this analysis is to validate that there exists no sequence of replicates (for all possible combination of replicates) that results in a non-significant pattern, when the initial pattern with combined replicates was shown to be significant.

A small Example:

Assume PhyloExpressionSet stores 3 developmental stages with 3 replicates measured for each stage. The 9 replicates in total are denoted as: 1.1, 1.2, 1.3, 2.1, 2.2, 2.3, 3.1, 3.2, 3.3. Now the function computes the statistical significance of each pattern derived by the corresponding combination of replicates, e.g.

1.1, 2.1, 3.1 -> p-value for combination 1
1.1, 2.2, 3.1 -> p-value for combination 2
1.1, 2.3, 3.1 -> p-value for combination 3
1.2, 2.1, 3.1 -> p-value for combination 4
1.2, 2.1, 3.1 -> p-value for combination 5
1.2, 2.1, 3.1 -> p-value for combination 6
1.3, 2.1, 3.1 -> p-value for combination 7
1.3, 2.2, 3.1 -> p-value for combination 8
1.3, 2.3, 3.1 -> p-value for combination 9

This procedure yields 27 p-values for the \(3^3\) (\(n_stages^n_replicates\)) replicate combinations.

Note, that in case you have a large amount of stages/experiments and a large amount of replicates the computation time will increase by \(n_stages^n_replicates\). For 11 stages and 4 replicates, 4^11 = 4194304 p-values have to be computed. Each p-value computation itself is based on a permutation test running with 1000 or more permutations. Be aware that this might take some time.

The p-value vector returned by this function can then be used to plot the p-values to see whether an critical value \(\alpha\) is exeeded or not (e.g. \(\alpha = 0.05\)).

The function receives a standard PhyloExpressionSet or DivergenceExpressionSet object and a vector storing the number of replicates present in each stage or experiment. Based on these arguments the function computes all possible replicate combinations using the expand.grid function and performs a permutation test (either a FlatLineTest for each replicate combination. The permutation parameter of this function specifies the number of permutations that shall be performed for each permutation test. When all p-values are computed, a numeric vector storing the corresponding p-values for each replicate combination is returned.

In other words, for each replicate combination present in the PhyloExpressionSet or DivergenceExpressionSet object, the TAI or TDI pattern of the corresponding replicate combination is tested for its statistical significance based on the underlying test statistic.

This function is also able to perform all computations in parallel using multicore processing. The underlying statistical tests are written in C++ and optimized for fast computations.

References

Drost HG et al. (2015) Mol Biol Evol. 32 (5): 1221-1231 doi:10.1093/molbev/msv012

Examples

Run this code

# NOT RUN {
# load a standard PhyloExpressionSet
data(PhyloExpressionSetExample)

# we assume that the PhyloExpressionSetExample 
# consists of 3 developmental stages 
# and 2 replicates for stage 1, 3 replicates for stage 2, 
# and 2 replicates for stage 3
# FOR REAL ANALYSES PLEASE USE: permutations = 1000 or 10000
# BUT NOTE THAT THIS TAKES MUCH MORE COMPUTATION TIME
p.vector <- CombinatorialSignificance(ExpressionSet = PhyloExpressionSetExample, 
                                      replicates    = c(2,3,2), 
                                      TestStatistic = "FlatLineTest", 
                                      permutations  = 10, 
                                      parallel      = FALSE)




# }

Run the code above in your browser using DataLab