Using Bioconductor's annotation packages, this function calculates enrichments and returns terms with best enrichment values.
GOenrichmentAnalysis(labels,
entrezCodes,
yeastORFs = NULL,
organism = "human",
ontologies = c("BP", "CC", "MF"),
evidence = "all",
includeOffspring = TRUE,
backgroundType = "givenInGO",
removeDuplicates = TRUE,
leaveOutLabel = NULL,
nBestP = 10, pCut = NULL,
nBiggest = 0,
getTermDetails = TRUE,
verbose = 2, indent = 0)labels. A single vector; the i-th entry corresponds to row i of the matrix
labels (or to the i-the entry if labels is a vecorganism=="yeast" (below), this argument can be used to input yeast open
reading frame (ORF) identifiers instead of Entrez codes. Since the GO mappings for yeast are provided in
terms of ORF identifiers, this may lead to a more accurate GO"human", "mouse", "rat", "malaria", "yeast", "fly", "bovine",
"worm", "canine", "zebrafish", "chicken"."BP", "CC", "MF". The result will contain the terms with highest enrichment
in each specified category, plus a separate list of terms w"allGiven", "allInGO", "givenInGO", meaning that the functions will take all
genes given in labels as backround ("allGiven"), alentrezCodes be removed? If
TRUE, only the first occurence of each unique Entrez code will be kept. The cluster labels
labels will be adjusted accordingly.pCut will be returned. If pCut is given, nBestP is ignored.TRUE if the entry was
used for enrichment analysis. Depending on the setting of removeDuplicates above, only a single
entry per gene may be used.TRUE if the gene belongs to any GO
term, FALSE otherwise. Also FALSE for genes not used for the analysis because of
duplication.labels contained only one vector of labels, the following components:ontologies in input, plus one component corresponding to all given ontologies combined.
The name of each component is set appropriately. Each inner list contains two components:
enrichment is
a data frame containing the highest enriched terms for each module; and forModule is a list of
lists with one inner list per module, appropriately named. Each inner list contains one component per term.
If input getTermDeyails is TRUE,
this component is yet another list and contains components termName (term name),
enrichmentP (enrichment P value), termDefinition (GO term definition),
termOntology (GO term ontology), geneCodes (Entrez codes of module genes in this term),
genePositions (indices of the genes listed in geneCodes within the given labels).
Thus, to obtain information on say the second term of the 5th module in ontology BP,
one can look at the appropriate row of bestPTerms$BP$enrichment, or one can reference
bestPTerms$BP$forModule[[5]][[2]]. The author of the function apologizes for any confusion this
structure of the output may cause.bestPTerms, containing information about the
terms with most genes in the module for each supplied ontology.labels contained more than one vector, instead of the above components the return value
contains a list named setResults that has one component per given set; each component is a list
containing the above components for the corresponding set.For best results, the newest annotation libraries should be used. Because of the way Bioconductor is set up, to get the newest annotation libraries you may have to use the current version of R.
According to http://www.geneontology.org/GO.evidence.shtml, the following codes are used by GO: Experimental Evidence Codes EXP: Inferred from Experiment IDA: Inferred from Direct Assay IPI: Inferred from Physical Interaction IMP: Inferred from Mutant Phenotype IGI: Inferred from Genetic Interaction IEP: Inferred from Expression Pattern
Computational Analysis Evidence Codes ISS: Inferred from Sequence or Structural Similarity ISO: Inferred from Sequence Orthology ISA: Inferred from Sequence Alignment ISM: Inferred from Sequence Model IGC: Inferred from Genomic Context IBA: Inferred from Biological aspect of Ancestor IBD: Inferred from Biological aspect of Descendant IKR: Inferred from Key Residues IRD: Inferred from Rapid Divergence RCA: inferred from Reviewed Computational Analysis
Author Statement Evidence Codes TAS: Traceable Author Statement NAS: Non-traceable Author Statement
Curator Statement Evidence Codes IC: Inferred by Curator ND: No biological Data available
Automatically-assigned Evidence Codes IEA: Inferred from Electronic Annotation
Obsolete Evidence Codes NR: Not Recorded