Learn R Programming

MOFA (version 1.3.1)

runEnrichmentAnalysis: Feature Set Enrichment Analysis

Description

Method to perform feature set enrichment analysis on the feature loadings. The input is a data structure containing the feature set membership, usually relating biological pathways to genes. The output is a matrix of dimensions (number_gene_sets,number_factors) with p-values and other statistics.

Usage

runEnrichmentAnalysis(object, view, feature.sets, factors = "all",
  local.statistic = c("loading", "cor", "z"),
  global.statistic = c("mean.diff", "rank.sum"),
  statistical.test = c("parametric", "cor.adj.parametric",
  "permutation"), transformation = c("abs.value", "none"),
  min.size = 10, nperm = 1000, cores = 1, p.adj.method = "BH",
  alpha = 0.1)

Arguments

object

a MOFAmodel object.

view

name of the view to perform enrichment on. Make sure that the feature names of the feature set file match the feature names in the MOFA model.

feature.sets

data structure that holds feature set membership information. Must be either a binary membership matrix (rows are feature sets and columns are features) or a list of feature set indexes (see vignette for details).

factors

character vector with the factor names to perform enrichment on. Alternatively, a numeric vector with the index of the factors. Default is all factors.

local.statistic

the feature statistic used to quantify the association between each feature and each factor. Must be one of the following: loading (the output from MOFA, default), cor (the correlation coefficient between the factor and each feature), z (a z-scored derived from the correlation coefficient).

global.statistic

the feature set statisic computed from the feature statistics. Must be one of the following: "mean.diff" (difference in means between the foreground set and the background set, default) or "rank.sum" (difference in rank sums between the foreground set and the background set).

statistical.test

the statistical test used to compute the significance of the feature set statistics under a competitive null hypothesis. Must be one of the following: "parametric" (very liberal, default), "cor.adj.parametric" (very conservative, adjusts for the inter-gene correlation), "permutation" (non-parametric, the recommended one if you can do sufficient number of permutations)

transformation

optional transformation to apply to the feature-level statistics. Must be one of the following "none" or "abs.value" (default).

min.size

Minimum size of a feature set (default is 10).

nperm

number of permutations. Only relevant if statistical.test is set to "permutation". Default is 1000.

cores

number of cores to run the permutation analysis in parallel. Only relevant if statistical.test is set to "permutation". Default is 1.

p.adj.method

Method to adjust p-values factor-wise for multiple testing. Can be any method in p.adjust.methods(). Default uses Benjamini-Hochberg procedure.

alpha

FDR threshold to generate lists of significant pathways. Default is 0.1

Value

a list with the following elements:

feature.statistics

feature statistics

set.statistics

feature-set statistics

pval

raw p-values

pval.adj

adjusted p-values

sigPathways

a list with enriched pathways

Details

This function relates the factors to pre-defined biological pathways by performing a gene set enrichment analysis on the loadings. The general idea is to compute an activity score for every pathway in each factor based on its corresponding gene loadings. This function is particularly useful when a factor is difficult to characterise based only on the genes with the highest loading. We provide several pre-build gene set matrices in the MOFAdata package. See https://github.com/bioFAM/MOFAdata for details. The function we implemented is based on the pcgse function with some modifications. Please read this paper https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4543476 for details on the math.

Examples

Run this code
# NOT RUN {
# Example on the CLL data
filepath <- system.file("extdata", "CLL_model.hdf5", package = "MOFAdata")
MOFAobject <- loadModel(filepath)

# perform Enrichment Analysis on mRNA data using pre-build Reactome gene sets
data("reactomeGS", package = "MOFAdata")
fsea.results <- runEnrichmentAnalysis(MOFAobject, view="mRNA", feature.sets=reactomeGS)

# heatmap of enriched pathways per factor at 1% FDR
plotEnrichmentHeatmap(fsea.results, alpha=0.01)

# plot number of enriched pathways per factor at 1% FDR
plotEnrichmentBars(fsea.results, alpha=0.01)

# plot top 10 enriched pathways on factor 5:
plotEnrichment(MOFAobject, fsea.results, factor=5,  max.pathways=10)
# }

Run the code above in your browser using DataLab