"goana"(de, coef = ncol(de), geneid = rownames(de), FDR = 0.05, trend = FALSE, ...)
"goana"(de, universe = NULL, species = "Hs", prior.prob = NULL, covariate=NULL, plot=FALSE, ...)
"kegga"(de, coef = ncol(de), geneid = rownames(de), FDR = 0.05, trend = FALSE, ...)
"kegga"(de, universe = NULL, species = "Hs", species.KEGG = NULL, convert = FALSE, gene.pathway = NULL, pathway.names = NULL, prior.prob = NULL, covariate=NULL, plot=FALSE, ...)
getGeneKEGGLinks(species.KEGG = "hsa", convert = FALSE)
getKEGGPathwayNames(species.KEGG = NULL, remove.qualifier = FALSE)
MArrayLM
fit object.nrow(de)
or the name of the column of de$genes
containing the Entrez Gene IDs."Hs"
(human), "Mm"
(mouse), "Rn"
(rat), "Dm"
(fly) or "Pt"
(chimpanzee), but other values are possible if the corresponding organism package is available.
See alias2Symbol
for other possible values.
Ignored if species.KEGG
or is not NULL
or if gene.pathway
and pathway.names
are not NULL
.gene.pathway
and pathway.names
are not NULL
.TRUE
then KEGG gene identifiers will be converted to NCBI Entrez Gene identifiers. Note that KEGG IDs are the same as Entrez Gene IDs for most species anyway.getGeneKEGGLinks(species.KEGG)
.TRUE
, the species qualifier will be removed from the pathway names.getKEGGPathwayNames(species.KEGG, remove=TRUE)
.de$genes
containing the covariate values.
If TRUE
, then de$Amean
is used as the covariate.NULL
then all Entrez Gene IDs associated with any gene ontology term will be used as the universe.universe
giving the prior probability that each gene in the universe appears in a gene set.
Will be computed from covariate
if the latter is provided.
Ignored if universe
is NULL
.universe
giving a covariate against which prior.prob
should be computed.
Ignored if universe
is NULL
.prior.prob
vs covariate
trend be plotted?MArrayLM
method are passed to the default method.goana
default method produces a data frame with a row for each GO term and the following columns:
"BP"
, "CC"
and "MF"
.DE
set.DE
.
In general, there will be a pair of such columns for each gene set and the name of the set will appear in place of "DE"
.The goana
method for MArrayLM
objects produces a data frame with a row for each GO term and the following columns:
"BP"
, "CC"
and "MF"
.kegga
is the same except that row names become KEGG pathway IDs, Term
becomes Pathway
and there is no Ont
column.
MArrayLM
method extracts the gene lists automatically from a linear model fit object.goana
uses annotation from the appropriate Bioconductor organism package.
The species
can be any character string XX for which an organism package org.XX.eg.db exists and is installed.
See alias2Symbol
for other possible values for species
.
kegga
reads KEGG pathway annotation from the KEGG website.
Note that the species name can be provided in either Bioconductor or KEGG format.
kegga
can be used for any species supported by KEGG, of which there are more than 14,000 possibilities.
By default, kegga
obtains the KEGG annotation for the specified species from the http://rest.kegg.jp website.
Alternatively one can supply the required pathway annotation to kegga
in the form of two data.frames.
If this is done, then an internet connection is not required.
The ability to supply data.frame annotation to kegga
means that kegga
can in principle be used to analyze any user-supplied gene sets.
The default goana
and kegga
methods accept a vector prior.prob
giving the prior probability that each gene in the universe appears in a gene set.
This vector can be used to correct for unwanted trends in the differential expression analysis associated with gene length, gene abundance or any other covariate.
The MArrayLM
object computes the prior.prob
vector automatically when trend
is non-NULL
.
If prior.prob=NULL
, the function computes one-sided hypergeometric tests equivalent to Fisher's exact test.
If prior probabilities are specified, then a test based on the Wallenius' noncentral hypergeometric distribution is used to adjust for the relative probability that each gene will appear in a gene set, following the approach of Young et al (2010).
The MArrayLM
methods performs over-representation analyses for the up and down differentially expressed genes from a linear model analysis.
In this case, the universe is all the genes found in the fit object.
trend=FALSE
is equivalent to prior.prob=NULL
.
If trend=TRUE
or a covariate is supplied, then a trend is fitted to the differential expression results and this is used to set prior.prob
.
The statistical approach provided here is the same as that provided by the goseq package, with one methodological difference and a few restrictions.
Unlike the goseq package, the gene identifiers here must be Entrez Gene IDs and the user is assumed to be able to supply gene lengths if necessary.
The goseq package has additional functionality to convert gene identifiers and to provide gene lengths.
The only methodological difference is that goana
and kegga
computes gene length or abundance bias using tricubeMovingAverage
instead of monotonic regression.
While tricubeMovingAverage
does not enforce monotonicity, it has the advantage of numerical stability when de
contains only a small number of genes.
topGO
, topKEGG
The goseq package provides an alternative implementation of methods from Young et al (2010). Unlike the limma functions documented here, goseq will work with a variety of gene identifiers and includes a database of gene length information for various species.
The gostats package also does GO analyses without adjustment for bias but with some other options.
See 10.GeneSetTests for a description of other functions used for gene set testing.
## Not run:
# ## Linear model usage:
#
# fit <- lmFit(y, design)
# fit <- eBayes(fit)
#
# # Standard GO analysis
# go.fisher <- goana(fit, species="Hs")
# topGO(go.fisher, sort = "up")
# topGO(go.fisher, sort = "down")
#
# # GO analysis adjusting for gene abundance
# go.abund <- goana(fit, geneid = "GeneID", trend = TRUE)
# topGO(go.abund, sort = "up")
# topGO(go.abund, sort = "down")
#
# # GO analysis adjusting for gene length bias
# # (assuming that y$genes$Length contains gene lengths)
# go.len <- goana(fit, geneid = "GeneID", trend = "Length")
# topGO(go.len, sort = "up")
# topGO(go.len, sort = "down")
#
# ## Default usage with a gene list:
#
# go.de <- goana(list(DE1 = EG.DE1, DE2 = EG.DE2, DE3 = EG.DE3))
# topGO(go.de, sort = "DE1")
# topGO(go.de, sort = "DE2")
# topGO(go.de, ontology = "BP", sort = "DE3")
# topGO(go.de, ontology = "CC", sort = "DE3")
# topGO(go.de, ontology = "MF", sort = "DE3")
#
# ## Standard KEGG analysis
# k <- kegga(fit, species="Hs")
# k <- kegga(fit, species.KEGG="hsa") # equivalent to previous
# ## End(Not run)
Run the code above in your browser using DataLab