Run Gene Ontology analysis based on hypergeometric test from a probeset list
runHyperGO(list, pack.annot, categorySize = 1, verbose = TRUE,
name = "hyperGO", htmlreport = TRUE, txtreport = TRUE,
tabResult = FALSE, pvalue = 0.05)
vector of character with probeset names
annotation package to use
integer, minimum size for category, by default = 1
logical, if TRUE, results are displayed, by default TRUE
character, name for output files, by default "hyperGO"
logical, if TRUE, a html report is created, by default TRUE
logical, if TRUE, a txt report is created, by default TRUE
logical, if TRUE, a list with the results is created, by default FALSE
numeric, a cutoff for the hypergeometric test pvalue, by default 0.05
The R objects or the Txt and html reports
Data.frame with results for Biological Process with GO Id, pvalue, Odd Ratio, Expected count, Size and GO Term
Idem for Molecular Function
Idem for Cellular Component
The choice of the universe could have a significant impact on the results. It is well discussed in the vignette of the GOstats package. Here, we decided to apply a non-specific filtering procedure different from the one proposed by Falcon and Gentleman. Since not all genes will be expressed under all conditions in our data, we can ask the question of defining the universe only with the expressed genes or with all the genes of the array. Actually, we are not able to distinguish the genes which are biologically non expressed, from the ones of low quality. That's why we think that the non-expressed probesets could be biologically relevant, as well as the ones with a little variation accross samples, and we decided to first defined the universe with all the genes of the array. Then, we just remove probe sets that have no Entrez Gene identifier in our annotation data or no GO annotation. Finally, the Hypergeometric test is performed on the unique EntrezId of the gene list, and the unique EntrezId of the universe. The pvalues in output are not corrected from multiple testing. Note that because of the existing dependence structure (between genes, and GO terms) it is difficult to do any multiple testing correction. Moreover the most insteresting genesets are not necessarily the ones with the smallest pvalues. Nodes that are interesting are typically those with a reasonable number of genes (10 or more) and small pvalues.
runHyperGO
needs packages GOstats
and GO.db
from Bioconductor.
# NOT RUN {
require(hgu133plus2.db)
data(marty)
## Probe list
probeList <- rownames(marty)[1:50]
## Hypergeometric test for GO pathway
res <- runHyperGO(probeList, htmlreport = FALSE, txtreport = FALSE,
tabResult = TRUE, pack.annot = "hgu133plus2.db")
# }
Run the code above in your browser using DataLab