ddcorGO: Gene ontology of differential correlation-classified genes.

Description

Extracts a data frame of the top enriched gene sets in gene ontology databases using the hypergeometric test for gene synmols that are members of gene pairs in each of the classes specified in the differentially correlated gene pairs input table. Default parameter settings are to take in a result table with HGNC symbols and convert them to Ensembl symbols for gene ontology testing.

Usage

ddcorGO(ddcor_res, universe, pval_gene_thresh = 0.05, classes = FALSE,
  geneNameCol = c("Gene1", "Gene2"), pval_GO_cutoff = 1,
  HGNC_clean = TRUE, HGNC_switch = TRUE, gene_ontology = "all",
  adjusted = FALSE, annotation = "org.Hs.eg.db", conditional = FALSE,
  calculateVariance = FALSE, unique_genes = FALSE, regcor = FALSE,
  ddcor_find_significant = TRUE, ddcorGO_res = NULL)

Value

A list of data frames corresponding to the gene ontology enrichment analysis results for the extracted gene sets from each of the differential correlation classes.

Arguments

ddcor_res: The table of differential correlations outputted from ddcor. Expected to have pValDiff or pValDiff_adj columns as well as zScoreDiff, Gene1, +/- Classes columns.
universe: Character vector of gene symbols which should be used as the background in the hypergeomtric test. If using this in the context of a DGCA experiment, this gene list most likely should be the gene set post-filtering, but prior to differential correlation analysis.
pval_gene_thresh: p-value threshold to call a gene as having significant differential correlation or not.
classes: Logical indicator specifying whether individual differential correlation gene classes should be extracted from the table or not. If not, only the zScoreDiff column is used to specify positively or negatively differentially correlated genes between the two conditions.
geneNameCol: Character vector specifying the name of the columns that are used to extract the gene symbols. Note that the default is c("Gene1", "Gene2"), but this only makes sense in the context of a full DGCA experiment. In the case of a splitSet, you may want to use "Gene1" to avoid counting the splitSet names in all of the categories.
pval_GO_cutoff: Cutoff for the unadjusted p-values of gene ontology terms in the enrichment tests that should be displayed in the resulting table.
HGNC_clean: Logical indicating whether the input gene symbols should be switched to clean HGNC symbols using the checkGeneSymbols function from the R package HGNChelper. Only applies if HGNC symbols are inputted.
HGNC_switch: Logical indicating whether or not the input gene symbols need to be switched from HGNC to Ensembl, the latter of which is required for GOstats enrichment test. Note that this is done by selecting the first Enembl symbol that maps to a particular HGNC symbol, which is not always unique. If you need more precision on the conversion, you should do this outside of the function and insert the Ensembl list to the function.
gene_ontology: A string specifying the branch of GO that should be used for enrichment analysis. One of "BP" (Biological Process), "MF" (Molecular Function), "CC" (Cellular Component), or "all". If "all" is chosen, then this function finds the enrichment for all of the terms and combines them into one table. Default = "all"
adjusted: Logical indicating whether adjusted p-values from the differential correlation table (i.e., column "pValDiff_adj", when adjusted = TRUE) or unadjusted p-values (i.e., column "pValDiff", when adjusted = FALSE) should be used to subset the table into significant and non-significant portions.
annotation: The library indicating the GO annotation database from which the Go terms should be mapped to gene symbols. Default = "org.Hs.eg.db", which is the table for Homo sapiens. Other common choices include "org.Mm.eg.db", "org.Rn.eg.db". The corresponding annotation library needs to be installed.
conditional: Logical specifying whether the GO analysis should be done conditionally to take into account the hierarchical structure of the GO database in making sense of the gene set enrichments.
calculateVariance: Optionally, find the variance of the odds ratio for each enrichment test. In particular, this finds the standard error of the log odds ratio, which converges to a normal distribution much more quickly than the non-log OR.
unique_genes: Logical, if TRUE indicates that unique gene symbols within gene pairs from each category compared to the other groups should be chosen prior to GO enrichment analysis.
regcor: Logical specifying whether the ddcorGO analysis should be performed on the results of a regcor data analysis. Note that the classes option is not available in this case.
ddcor_find_significant: Logical specifying whether this enrichment analysis should be performed on the result of a ddcor analysis. If FALSE, then a ddcorGO_res object, which is a named list of gene vectors, must be defined instead.
ddcorGO_res: Optional named list of gene vectors to find the enrichment of if ddcor_find_signficiant is FALSE.

References

Agresti A: Categorical Data Analysis. 2012:70-77.