FDR: Compute FDR for general scenarios

Description

FDR computes the false discovery rate for comparing gene expression between two groups of subjects when the distribution of the test statistic under the null and alternative hypothesis are both mixtures of t-distributions. CDF and CDFmix calculate these mixtures.

Usage

FDR(x, n1, n2, pmix, D0, p0, D1, p1, sigma)
CDF(x, n1, n2, D, p, sigma)
CDFmix(x, n1, n2, pmix, D0, p0, D1, p1, sigma)
FDR.paired(x, n, pmix, D0, p0, D1, p1, sigma)
CDF.paired(x, n, D, p, sigma)
CDFmix.paired(x, n, pmix, D0, p0, D1, p1, sigma)

Arguments

vector of quantiles (two-sample t-statistics)

n, n1, n2

vector of sample sizes (as subjects per group)

pmix

the proportion of non-differentially expressed genes

vector of effect sizes for the null distribution

vector of mixing proportions for D0; must be the same length as D0 and sum to one

vector of effect sizes for the alternative distribution

vector of mixing proportions for D1, same as p0

D, p

generic vectors of effect sizes and mixing proportions as above

sigma

the standard deviation

Value

The appropriate vector of FDRs or probabilities.

Details

These functions are designed for a simple experimental setup, where we wish to compare gene expression between two groups of subjects of size n1 and n2 for an unspecified number of genes, using an equal-variance t-statistic.

100pmix% of the genes are assumed to be not differentially expressed. The corresponding t-statistics follow a mixture of t-distributions; this is more general than the usual central t-distribution, because we may want to include genes with biologically small effects under the null hypothesis (Pawitan et al., 2005). The other 100(1-pmix)% genes are assumed to be differentially expressed; their t-statistics are also mixtures of t-distributions.

The mixture proportions of t-distributions under the null and alternative hypothesis are specified via p0 and p1, respectively. The individual t-distributions are specified via the means D0 and D1 and the standard deviation sigma of the underlying data (instead of the mathematically more obvious, but less intuitive non centrality parameters). As the underlying data are the logarithmized expression values, D0 and D1 can be interpreted as average log-fold change between conditions, measured in units of sigma. See Examples.

CDF computes the cumulative distribution function for a mixture of t-distributions based on means D and standard deviation sigma with mixture proportions p. This function is the work horse for CDFmix.

Note that the base functions (FDR, CDFmix, CDF) assume two groups of experimental units; the .paired functions provide the same functionality for one group of paired observations.

The distribution functions call pt for computation; correspondingly, the quantiles x and all arguments that define degrees of freedom and non centrality parameters (n1, n2, D0, D1, sigma) can be vectors, and will be recycled as necessary.

References

Pawitan Y, Michiels S, Koscielny S, Gusnanto A, Ploner A. (2005) False Discovery Rate, Sensitivity and Sample Size for Microarray Studies. Bioinformatics, 21, 3017-3024.

Examples

Run this code

# FDR for H0: 'log fold change is zero'
#     vs. H1: 'log fold change is -1 or 1' 
#             (ie two-fold up- or down regulation) 
FDR(1:6, n1=10, n2=10, pmix=0.90, D0=0, p0=1, 
    D1=c(-1,1), p1=c(0.5, 0.5), sigma=1)

# Include small log fold changes in the H0
# Naturally, this increases the FDR
FDR(1:6, n1=10, n2=10, pmix=0.90, D0=c(-0.25,0, 0.25), p0=c(1/3,1/3,1/3), 
    D1=c(-1,1), p1=c(0.5, 0.5), sigma=1)

# Consider an asymmetric alternative
# 10 percent of the regulated genes are assumed to be four-fold upregulated
FDR(1:6, n1=10, n2=10, pmix=0.90, D0=0, p0=1, 
    D1=c(-1,1,2), p1=c(0.45, 0.45, 0.1), sigma=1)

Run the code above in your browser using DataLab