fisherGOProfiles: GO Class-by-class Fisher tests in lists of genes characterized by their functional profiles

Description

Given two lists of genes, both characterized by their frequencies of annotations (or "hits") in the same set of GO nodes (also designated as GO terms or GO classes), for each node determine if the annotation frequencies depart from what is expected by chance. The annotation frequencies are specified in the "GO profiles" arguments pn, qm and pn. Both samples may share a common subsample of genes, with GO profile pqn0. The analysis is based on the Fisher's exact test, as is implemented by fisher.test R function, followed by p-value adjustment for multitesting based on function p.adjust. Usually, this function will be called after a significant result on compareGOProfiles which performs global (all GO nodes simultaneously) profile comparisons (with better type I and type II error control), to identify the more rellevant nodes.

Usage

fisherGOProfiles(pn, ...)
"fisherGOProfiles"(pn, qm=NULL, pqn0=NULL, n = ngenes(pn), m = ngenes(qm), n0 = ngenes(pqn0), method = "BH", simplify=T, expanded=F, ...)
"fisherGOProfiles"(pn, n, m, method = "BH", ...)
"fisherGOProfiles"(pn, qm=NULL, pqn0=NULL, method = "BH", goIds=T, ...)
"fisherGOProfiles"(pn, qm=NULL, pqn0=NULL, method = "BH", simplify=T, ...)

Arguments

an object of class BasicGOProfile or ExpandedGOProfile representing a "sample" GO profile for a fixed ontology, or a numeric vector interpretable as a GO profile (expanded or not), or a two-dimensional frequency matrix (see the 'Details' section). This is a required argument

similarly, an object representing a "sample" GO profiles for a fixed ontology

pqn0

an object representing a "sample" GO profile for a fixed ontology

the number of genes profiled in pn

the number of genes profiled in qm

the number of genes profiled in pqn0

method

the p-values adjusting method for multiple comparisons; the same possibilities as in standard R function p.adjust

expanded

boolean; are these numeric vectors representing expanded profiles?

simplify

should the result be simplified, if possible? See the 'Details' section

goIds

if TRUE, each node is represented by its GO identifier

...

other arguments (to be passed to p.adjust or fisher.test functions)

Value

A list containing max(ncol(pn),ncol(qm),ncol(pqn0)) p-values numeric vectors, or a single p-values vector if max(ncol(pn),ncol(qm),ncol(pqn0))==1 and simplify == T.

Details

Given a list of n genes, and a set of s GO classes or nodes X, Y, Z, ... in a given ontology (BP, MF or CC), its associated ("contracted" or "basic") "profile" is the absolute frequencies vector of annotations or hits of the n genes in each one of the s GO nodes. For a given node, say X, this frequency includes all annotations for X alone, for X and Y, for X and Z and so on. Thus, as relative frequencies, its sum is not necessarily one, or as absolute frequencies their sum is not necessarily n. On the other hand, an "expanded profile" corresponds to the relative frequencies in ALL NODE COMBINATIONS. That is, if n genes have been profiled, the expanded profile stands for the frequency of all hits EXCLUSIVELY in node X, exclusively in node Y, exclusively in Z, ..., jointly with all hits simultaneously in nodes X and Y (and only in X and Y), simultaneously in X and Z, in Y and Z, ... , in X and Y and Z (and only in X,Y,Z), and so on. Thus, their sum is one. Let n, m and n0 designate the total number of genes profiled in pn, qm and pqn0 respectively. According to these profiles, n[i], m[i] and n0[i] genes are annotated for node 'i', i = 1, ..., s. Note that the sum of all the n[i] not necessarily equals n and so on. If not NULL, pqn0 stands for the profile of the n0 genes common to the gene lists that gave rise to pn and qm. fisherGOProfiles builds a sx2 absolute frequencies matrix

GO node 1	N[1,1]
N[1,2]	GO node 2
N[2,1]	N[2,2]
...	...
...	GO node `s`
N[2,1]	N[s,2]

with column totals N1 and N2 (not necessarily equal to the column sums) and performs a Fisher's exact test over each one of the 2x2 tables

GO node i	N[i,1]
N[i,2]	All nodes except i
N1 - N[i,1]	N2 - N[i,2]

followed by a p-value correction for multiplicity in testing. If pqn0 is NULL, then both gene lists do not have any genes in common, N[i,1] = n[i] and N[i,2] = m[i], and N1 = n, N2 = m, n0 = 0. Otherwhise (if pqn0 is not NULL) N[i,1] = n[i] - n0[i], N1 = n - n0 and N[i,2] = n0[i], N2 = n0 if qm is NULL, or N[i,2] = m[i], N2 = m if qm is not NULL. In other words, this function provides a general setting for diverse, common in practice, situations where a node-by-node analysis is required. When pqn0 = NULL, two lists with no genes in common are compared. Otherwise, when qm = NULL, the genes profiled in pn are compared with a subsample of them, those profiled in pqn0 (a set of genes vs a restricted subset, e.g. those overexpressed under a disease). Finally, if both arguments qm and pqn0 are not NULL (pn is always required) two gene lists with some genes in common are analised.

If both qm and pqn0 are NULL, pn should correspond to an absolute frequencies matrix with s rows and 2 columns.

The arguments n, m or n0 are only required in case of numeric vectors or matrices specifying profiles but lacking the 'ngenes' attribute.

References

Sanchez-Pla, A., Salicru M. and Ocana, J. Statistical methods for the analysis of highthroughput data based on functional profiles derived from the gene ontology. Journal of Statistical Planning and Inference, 2007.

Examples

Run this code

require("org.Hs.eg.db")
data(prostateIds)        # "singh01EntrezIDs", "singh05EntrezIDs", "welsh01EntrezIDs", "welsh05EntrezIDs"
# To improve speed, use only the first 100 genes:
list1 <- welsh01EntrezIDs[1:100]
list2 <- singh01EntrezIDs[1:100]
prof1 <- basicProfile(list1, onto="MF", level=2, orgPackage="org.Hs.eg.db")$MF
prof2 <- basicProfile(list2, onto="MF", level=2, orgPackage="org.Hs.eg.db")$MF
commProf <- basicProfile(intersect(list1, list2), onto="MF", level=2, orgPackage="org.Hs.eg.db")$MF
fisherGOProfiles(prof1, prof2, commProf, method="holm")

Run the code above in your browser using DataLab