score_terms: Calculate Agglomerated Scores of Enriched Terms for Each Subject

Description

Calculate Agglomerated Scores of Enriched Terms for Each Subject

Usage

score_terms(
  enrichment_table,
  exp_mat,
  cases = NULL,
  use_description = FALSE,
  plot_hmap = TRUE,
  ...
)

Value

Matrix of agglomerated scores of each enriched term per sample. Columns are samples, rows are enriched terms. Optionally, displays a heatmap of this matrix.

Arguments

enrichment_table

a data frame that must contain the 3 columns below:

Term_Description: Description of the enriched term (necessary if use_description = TRUE)

ID

ID of the enriched term (necessary if use_description = FALSE)

Up_regulated

the up-regulated genes in the input involved in the given term's gene set, comma-separated

Down_regulated

the down-regulated genes in the input involved in the given term's gene set, comma-separated

exp_mat

the experiment (e.g., gene expression/methylation) matrix. Columns are samples and rows are genes. Column names must contain sample names and row names must contain the gene symbols.

cases

(Optional) A vector of sample names that are cases in the case/control experiment. (default = NULL)

use_description

Boolean argument to indicate whether term descriptions (in the 'Term_Description' column) should be used. (default = FALSE)

plot_hmap

Boolean value to indicate whether or not to draw the heatmap plot of the scores. (default = TRUE)

...

Additional arguments for plot_scores for aesthetics of the heatmap plot

Conceptual Background

For an experiment matrix (containing expression, methylation, etc. values), the rows of which are genes and the columns of which are samples, we denote:

E as a matrix of size m x n
G as the set of all genes in the experiment G = E_i., i ∈ [1, m]
S as the set of all samples in the experiment S = E_.j, i ∈ [1, n]

We next define the gene score matrix GS (the standardized experiment matrix, also of size m x n) as:

GS_gs = (E_gs - ē_g) / s_g

where g ∈ G, s ∈ S, ē_g is the mean of all values for gene g and s_g is the standard deviation of all values for gene g.

We next denote T to be a set of terms (where each t ∈ T is a set of term-related genes, i.e., t = {g_x, ..., g_y} ⊂ G) and finally define the agglomerated term scores matrix TS (where rows correspond to genes and columns corresponds to samples s.t. the matrix has size |T| x n) as:

TS_ts = 1/|t| ∑ _{g ∈ t} GS_gs, where t ∈ T and s ∈ S.

Examples

Run this code

score_matrix <- score_terms(
  example_pathfindR_output,
  example_experiment_matrix,
  plot_hmap = FALSE
)

Run the code above in your browser using DataLab