pn
, qm
and pn
.
Both samples may share a common subsample of genes, with GO profile
pqn0
. The analysis is based on the Fisher's exact test, as is
implemented by fisher.test
R function, followed by p-value adjustment for
multitesting based on function p.adjust
. Usually, this function will be
called after a significant result on compareGOProfiles
which performs
global (all GO nodes simultaneously) profile comparisons (with better
type I and type II error control), to identify the more rellevant nodes.fisherGOProfiles(pn, ...)
"fisherGOProfiles"(pn, qm=NULL, pqn0=NULL, n = ngenes(pn), m = ngenes(qm), n0 = ngenes(pqn0), method = "BH", simplify=T, expanded=F, ...)
"fisherGOProfiles"(pn, n, m, method = "BH", ...)
"fisherGOProfiles"(pn, qm=NULL, pqn0=NULL, method = "BH", goIds=T, ...)
"fisherGOProfiles"(pn, qm=NULL, pqn0=NULL, method = "BH", simplify=T, ...)
BasicGOProfile
or ExpandedGOProfile
representing a "sample" GO profile for a fixed ontology, or a numeric vector
interpretable as a GO profile (expanded or not), or a two-dimensional
frequency matrix (see the 'Details' section). This is a required argumentp.adjust
p.adjust
or fisher.test
functions)n
genes, and a set of s
GO classes or nodes
X, Y, Z, ... in a given ontology
(BP, MF or CC), its associated ("contracted" or "basic") "profile" is the
absolute frequencies vector of annotations or hits of the n
genes in each
one of the s
GO nodes.
For a given node, say X, this frequency includes all annotations for X alone, for X and Y,
for X and Z and so on. Thus, as relative frequencies, its sum is not necessarily one,
or as absolute frequencies their sum is not necessarily n
.
On the other hand, an "expanded profile" corresponds to the relative frequencies
in ALL NODE COMBINATIONS. That is, if n
genes have been profiled, the
expanded profile stands
for the frequency of all hits EXCLUSIVELY in node X, exclusively in node Y,
exclusively in Z, ..., jointly with
all hits simultaneously in nodes X and Y (and only in X and Y), simultaneously in X and Z,
in Y and Z, ... , in X and Y and Z (and only in X,Y,Z), and so on.
Thus, their sum is one.
Let n
, m
and n0
designate the total number of genes
profiled in pn
, qm
and pqn0
respectively.
According to these profiles, n[i], m[i] and n0[i] genes are annotated
for node 'i', i = 1, ..., s
. Note that the sum of all the n[i] not
necessarily equals n
and so on.
If not NULL, pqn0
stands for the profile of the n0
genes common to the gene lists that gave rise to pn
and qm
.
fisherGOProfiles
builds a s
x2 absolute frequencies matrix
GO node 1 | N[1,1] |
N[1,2] | GO node 2 |
N[2,1] | N[2,2] |
... | ... |
... |
GO node s |
N[2,1] | N[s,2] |
GO node i | N[i,1] |
N[i,2] | All nodes except i |
N1 - N[i,1] | N2 - N[i,2] |
pqn0
is NULL, then both gene lists do not have any genes in common,
N[i,1] = n[i] and N[i,2] = m[i], and N1 = n, N2 = m, n0 = 0.
Otherwhise (if pqn0
is not NULL) N[i,1] = n[i] - n0[i], N1 = n - n0 and
N[i,2] = n0[i], N2 = n0 if qm
is NULL, or N[i,2] = m[i], N2 = m if qm
is not NULL.
In other words, this function provides a general setting for diverse, common
in practice, situations where a node-by-node analysis is required.
When pqn0
= NULL, two lists with no genes in common are compared.
Otherwise, when qm
= NULL, the genes profiled in pn
are compared
with a subsample of them, those profiled in pqn0
(a set of genes vs a restricted subset,
e.g. those overexpressed under a disease). Finally, if both arguments qm
and pqn0
are not NULL (pn
is always required) two gene lists with
some genes in common are analised. If both qm
and pqn0
are NULL, pn
should correspond to an
absolute frequencies matrix with s
rows and 2 columns.
The arguments n
, m
or n0
are only required in case of
numeric vectors or matrices specifying profiles but lacking the 'ngenes' attribute.
require("org.Hs.eg.db")
data(prostateIds) # "singh01EntrezIDs", "singh05EntrezIDs", "welsh01EntrezIDs", "welsh05EntrezIDs"
# To improve speed, use only the first 100 genes:
list1 <- welsh01EntrezIDs[1:100]
list2 <- singh01EntrezIDs[1:100]
prof1 <- basicProfile(list1, onto="MF", level=2, orgPackage="org.Hs.eg.db")$MF
prof2 <- basicProfile(list2, onto="MF", level=2, orgPackage="org.Hs.eg.db")$MF
commProf <- basicProfile(intersect(list1, list2), onto="MF", level=2, orgPackage="org.Hs.eg.db")$MF
fisherGOProfiles(prof1, prof2, commProf, method="holm")
Run the code above in your browser using DataLab