runHyperGO: Run Gene Ontology analysis based on hypergeometric test from a probeset list

Description

Run Gene Ontology analysis based on hypergeometric test from a probeset list

Usage

runHyperGO(list, pack.annot, categorySize = 1, verbose = TRUE,
name = "hyperGO", htmlreport = TRUE, txtreport = TRUE,
tabResult = FALSE, pvalue = 0.05)

Arguments

list

vector of character with probeset names

pack.annot

annotation package to use

categorySize

integer, minimum size for category, by default = 1

verbose

logical, if TRUE, results are displayed, by default TRUE

name

character, name for output files, by default "hyperGO"

htmlreport

logical, if TRUE, a html report is created, by default TRUE

txtreport

logical, if TRUE, a txt report is created, by default TRUE

tabResult

logical, if TRUE, a list with the results is created, by default FALSE

pvalue

numeric, a cutoff for the hypergeometric test pvalue, by default 0.05

Value

The R objects or the Txt and html reports

Data.frame with results for Biological Process with GO Id, pvalue, Odd Ratio, Expected count, Size and GO Term

Idem for Molecular Function

Idem for Cellular Component

Details

The choice of the universe could have a significant impact on the results. It is well discussed in the vignette of the GOstats package. Here, we decided to apply a non-specific filtering procedure different from the one proposed by Falcon and Gentleman. Since not all genes will be expressed under all conditions in our data, we can ask the question of defining the universe only with the expressed genes or with all the genes of the array. Actually, we are not able to distinguish the genes which are biologically non expressed, from the ones of low quality. That's why we think that the non-expressed probesets could be biologically relevant, as well as the ones with a little variation accross samples, and we decided to first defined the universe with all the genes of the array. Then, we just remove probe sets that have no Entrez Gene identifier in our annotation data or no GO annotation. Finally, the Hypergeometric test is performed on the unique EntrezId of the gene list, and the unique EntrezId of the universe. The pvalues in output are not corrected from multiple testing. Note that because of the existing dependence structure (between genes, and GO terms) it is difficult to do any multiple testing correction. Moreover the most insteresting genesets are not necessarily the ones with the smallest pvalues. Nodes that are interesting are typically those with a reasonable number of genes (10 or more) and small pvalues.

runHyperGO needs packages GOstats and GO.db from Bioconductor.

Examples

Run this code

# NOT RUN {
require(hgu133plus2.db)
data(marty)

## Probe list
probeList <- rownames(marty)[1:50]

## Hypergeometric test for GO pathway
res <- runHyperGO(probeList, htmlreport = FALSE, txtreport = FALSE,
    tabResult = TRUE, pack.annot = "hgu133plus2.db")
# }