Learn R Programming

netgsa

Network-based Gene Set Analysis

This package carries out Network-based Gene Set Analysis by incorporating external information about interactions among genes, as well as novel interactions learned from data.

Package Installation

You can install it directly from GitHub through devtools:

library(devtools)
devtools::install_github("mikehellstern/netgsa", build_vignettes=T)

Updates

The most recent implementation has optimized the NetGSA computation in the following aspects:

  • Variance component estimation: Residuals are needed to estimate the variance components. This is done directly without evaluating the fixed effect coefficients.
  • Contrast vector/matrix: This is done more efficiently by leveraging the lapply function; products of the contrast vectors are first computed and reused to calculate the degrees of freedom and test statistics.
  • In the main function NetGSA, the default input A is a list of adjacency matrices across the tested groups. For each group, we assume that its adjacency matrix is again coded as a list of smaller matrices (or in the extreme case, one matrix of size p). We do not assume the adjancency matrices across groups to have the same block diagonal structure. For this particular structure, I removed the check on variable compatibility between the adjacency matrices and the input data, but this should be added later.
  • Note the use of adj2inf should not change if we have block diagonal adj matrix, because the list of eigenvalues remain the same.
  • The fixed effect coefficients beta is currently an output from the main function NetGSA, but do we need it? Is there a better way of estimating beta given the block diagonal structure of D?
  • Although it was mentioned in the notes that we may not need to assemble the entire adjacency matrix to get the test statistic for a given pathway, in practice we may be working with more pathways. Would it be better if we assemble the entire adjacency matrix anyway to avoid repeatedly subsetting variables? In either case, before running NetGSA, we should make sure to filter variables in the input data matrix to keep only those that belong to at least one tested pathway.

References

Ma, Jing, Shojaie, Ali and Michailidis, George. (2016) Network-based pathway enrichment analysis with incomplete network information. Bioinformatics https://doi.org/10.1093/bioinformatics/btw410

Copy Link

Version

Install

install.packages('netgsa')

Monthly Downloads

299

Version

4.0.5

License

GPL (>= 3)

Issues

Pull Requests

Stars

Forks

Maintainer

Michael Hellstern

Last Published

November 14th, 2023

Functions in netgsa (4.0.5)

pathways_mat

Matrix with pathway indicators
stackDatabases

Combine edges from databases into a data.table
obtainClusters

Estimate optimal gene clustering structure
bic.netEst.undir

Bayesian information criterion to select the tuning parameters for netEst.undir
group

The vector of class indicators
formatPathways

Format cytoscape nested networks
netEst.dir

Constrained estimation of directed networks
netEst.undir

Constrained estimation of undirected networks
NetGSAq

"Quick" Network-based Gene Set Analysis
NetGSA

Network-based Gene Set Analysis
prepareAdjMat

Construct adjacency matrices from graphite databases and/or user provided network information
obtainEdgeList

Obtain edgelist from graphite databases. To be used within prepareAdjMat
pathways

A list of KEGG pathways
netgsa-package

Network-Based Gene Set Analysis
breastcancer2012_subset

Breast cancer data from TCGA (2012). This is a 750 gene subset
edgelist

A data frame of edges, each row corresponding to one edge
nonedgelist

A data frame of nonedges, each row corresponding to one negative edge
x

Data matrix p by n
plot.NetGSA

Generates NetGSA plots
zoomPathway

Zoom in on pathway in igraph