Learn R Programming

DGCA (version 1.0.3)

ddcorGO: Gene ontology of differential correlation-classified genes.

Description

Extracts a data frame of the top enriched gene sets in gene ontology databases using the hypergeometric test for gene synmols that are members of gene pairs in each of the classes specified in the differentially correlated gene pairs input table. Default parameter settings are to take in a result table with HGNC symbols and convert them to Ensembl symbols for gene ontology testing.

Usage

ddcorGO(ddcor_res, universe, pval_gene_thresh = 0.05, classes = FALSE,
  geneNameCol = c("Gene1", "Gene2"), pval_GO_cutoff = 1,
  HGNC_clean = TRUE, HGNC_switch = TRUE, gene_ontology = "all",
  adjusted = FALSE, annotation = "org.Hs.eg.db", conditional = FALSE,
  calculateVariance = FALSE, unique_genes = FALSE, regcor = FALSE,
  ddcor_find_significant = TRUE, ddcorGO_res = NULL)

Value

A list of data frames corresponding to the gene ontology enrichment analysis results for the extracted gene sets from each of the differential correlation classes.

Arguments

ddcor_res

The table of differential correlations outputted from ddcor. Expected to have pValDiff or pValDiff_adj columns as well as zScoreDiff, Gene1, +/- Classes columns.

universe

Character vector of gene symbols which should be used as the background in the hypergeomtric test. If using this in the context of a DGCA experiment, this gene list most likely should be the gene set post-filtering, but prior to differential correlation analysis.

pval_gene_thresh

p-value threshold to call a gene as having significant differential correlation or not.

classes

Logical indicator specifying whether individual differential correlation gene classes should be extracted from the table or not. If not, only the zScoreDiff column is used to specify positively or negatively differentially correlated genes between the two conditions.

geneNameCol

Character vector specifying the name of the columns that are used to extract the gene symbols. Note that the default is c("Gene1", "Gene2"), but this only makes sense in the context of a full DGCA experiment. In the case of a splitSet, you may want to use "Gene1" to avoid counting the splitSet names in all of the categories.

pval_GO_cutoff

Cutoff for the unadjusted p-values of gene ontology terms in the enrichment tests that should be displayed in the resulting table.

HGNC_clean

Logical indicating whether the input gene symbols should be switched to clean HGNC symbols using the checkGeneSymbols function from the R package HGNChelper. Only applies if HGNC symbols are inputted.

HGNC_switch

Logical indicating whether or not the input gene symbols need to be switched from HGNC to Ensembl, the latter of which is required for GOstats enrichment test. Note that this is done by selecting the first Enembl symbol that maps to a particular HGNC symbol, which is not always unique. If you need more precision on the conversion, you should do this outside of the function and insert the Ensembl list to the function.

gene_ontology

A string specifying the branch of GO that should be used for enrichment analysis. One of "BP" (Biological Process), "MF" (Molecular Function), "CC" (Cellular Component), or "all". If "all" is chosen, then this function finds the enrichment for all of the terms and combines them into one table. Default = "all"

adjusted

Logical indicating whether adjusted p-values from the differential correlation table (i.e., column "pValDiff_adj", when adjusted = TRUE) or unadjusted p-values (i.e., column "pValDiff", when adjusted = FALSE) should be used to subset the table into significant and non-significant portions.

annotation

The library indicating the GO annotation database from which the Go terms should be mapped to gene symbols. Default = "org.Hs.eg.db", which is the table for Homo sapiens. Other common choices include "org.Mm.eg.db", "org.Rn.eg.db". The corresponding annotation library needs to be installed.

conditional

Logical specifying whether the GO analysis should be done conditionally to take into account the hierarchical structure of the GO database in making sense of the gene set enrichments.

calculateVariance

Optionally, find the variance of the odds ratio for each enrichment test. In particular, this finds the standard error of the log odds ratio, which converges to a normal distribution much more quickly than the non-log OR.

unique_genes

Logical, if TRUE indicates that unique gene symbols within gene pairs from each category compared to the other groups should be chosen prior to GO enrichment analysis.

regcor

Logical specifying whether the ddcorGO analysis should be performed on the results of a regcor data analysis. Note that the classes option is not available in this case.

ddcor_find_significant

Logical specifying whether this enrichment analysis should be performed on the result of a ddcor analysis. If FALSE, then a ddcorGO_res object, which is a named list of gene vectors, must be defined instead.

ddcorGO_res

Optional named list of gene vectors to find the enrichment of if ddcor_find_signficiant is FALSE.

References

Agresti A: Categorical Data Analysis. 2012:70-77.