fea_david: FEA - DAVID

Description

Performs the functional enrichment analysis and clustering through DAVID [1] (requires internet connection).

Usage

fea_david(geneList, geneIdType = "ENSEMBL_GENE_ID", geneLabels=NULL, 
  annotations = c("GOTERM_BP_ALL", "GOTERM_MF_ALL", "GOTERM_CC_ALL", 
  "KEGG_PATHWAY", "INTERPRO"), 
  email = NULL, 
  argsWS = c(overlap = 4L, initialSeed = 4L, finalSeed = 4L, linkage = 0.5, 
  kappa = 35L), jobName = NULL, downloadFile=TRUE)

Arguments

geneList

character vector. List of genes to analyze.

geneIdType

character vector. Type of gene identifier. Web API: ENSEMBL_GENE_ID, ENTREZ_GENE_ID, GENE_SYMBOL, UNIPROT_ID... For more, check DAVID's API documentation. Web Service: run getIdTypes(DAVIDWebService$new(email=...))

geneLabels

named character vector. Gene name or label to use in the report/plots instead of the original gene ID. The vector names should be the gene ID and the content of the vector the gene label. The resulting geneTermSets table will contain the original gene ID column (geneIDs) and the label column (Genes).

annotations

character vector. Annotation spaces for the functional analysis. Web API: check DAVID's API documentation. Web Service: run getAllAnnotationCategoryNames(DAVIDWebService$new(email=...)).

character. If provided, the query will be performed though DAVID's Web Service (recommended). Requires registration (see details).

argsWS

named integer vector. Additional arguments for the clustering. Only available using the web service.

jobName

character. Folder name and prefix for the files.

downloadFile

logical. If TRUE, the result files are saved in the current directory (required to generate report).

Value

Invisible list with the folowing fields: queryArgs list with the arguments for the query.
clusters data.frame containing the clusters and their information:
- Cluster: Cluster ID.
- nGenes: Number of genes in the cluster.
- ClusterEnrichmentScore: Score for the cluster.
- Genes: Genes in the cluster.
- Terms: Terms in the cluster.
- keyWordsTerm: Term is the most representative of the terms in the cluster based on keywords.
geneTermSets data.frame containing the gene-term sets that support each cluster.
- Cluster: Number (id) of the cluster the gene-term set belongs to.
- ClusterEnrichmentScore: Score for the cluster. Same value for all terms in each cluster.
- Category: Type of annotation of the term (i.e. GO, Kegg...)
- Terms: Term in the gene-term set.
- Genes: Genes in the gene-term set.
- GenesIDs: In case GeneLabels was provided, original gene ID.
- Other stats: Count, PValue, List.Total, Pop.Hits, Pop.Total, Fold.Enrichment, Bonferroni, Benjamini, FDR.
fileName .txt file with the formatted FEA results.

Warning

The web service and the API have different default arguments. To obtain the same results with both methods use:

API_defaults <- c(overlap=3L, initialSeed=3L, finalSeed=3L, linkage=0.5, kappa=50L)

fea_david(genesYeast, email="example@email.com", argsWS=API_defaults)

code

vignette("FGNet-vignette")

Details

To perform the queries, please register at http://david.abcc.ncifcrf.gov/webservice/register.htm. NOTE: Since August 2015, DAVID requires https. This causes errors in some systems. A (hopefully) temporary solution requries to install some certificates locally. See RDAVIDWebService help: https://support.bioconductor.org/p/70090/#72226 As an alternative, the web API allows to perform a small query without registering. Note this option is not available in some systems, and the maximum number of genes is limited to 400. (It can be less depending on the ID types and the length of the resulting URL). More details and full list of gene ID types and annotations are available at: http://david.abcc.ncifcrf.gov/content.jsp?file=DAVID_API.html. If the functional annotation and clustering has been performed directly at DAVID's Website (http://david.abcc.ncifcrf.gov/summary.jsp) fea_david() is not required. Instead, provide the file (or the URL of the file) containing the results of the analysis to format_david().

References

[1] Huang DW, Sherman BT, Lempicki RA (2009) Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37(1):1-13.

Examples

Run this code

# Load/format gene list:
geneList <- c("YBL084C", "YDL008W", "YDR118W", "YDR301W", "YDR448W", "YFR036W",
    "YGL240W", "YHR166C", "YKL022C", "YLR102C", "YLR115W", "YLR127C", "YNL172W", 
    "YOL149W", "YOR249C")

library(org.Sc.sgd.db)
geneLabels <- unlist(as.list(org.Sc.sgdGENENAME)[geneList])

geneExpr <- setNames(c(rep(1,10),rep(-1,5)), geneLabels) 

# DAVID
results_David <- fea_david(geneList, geneLabels=geneLabels, email="example@email.com")

# Available IDs and annotations:
getIdTypes(DAVIDWebService$new(email="example@email.com"))
getAllAnnotationCategoryNames(DAVIDWebService$new(email="example@email.com"))

results <- fea_david(geneList, geneIdType="ENSEMBL_GENE_ID",
    annotations="GOTERM_BP_ALL", email="example@email.com", jobName="yeastDavid")


# To continue the workflow... (see help for further details)
FGNet_report(results, geneExpr=geneExpr)

incidMat <- fea2incidMat(results)
functionalNetwork(incidMat, geneExpr=geneExpr)

Run the code above in your browser using DataLab