Learn R Programming

netgsa (version 4.0.5)

netEst.undir: Constrained estimation of undirected networks

Description

Estimates a sparse inverse covariance matrix using a lasso (L1) penalty.

Usage

netEst.undir(x, zero = NULL, one = NULL, lambda, rho=NULL, 
             penalize_diag = TRUE, weight = NULL, 
             eta = 0, verbose = FALSE, eps = 1e-08)

Value

A list with components

Adj

List of weighted adjacency matrices (partial correlations) of dimension \(p \times p\), with diagonal entries set to 0. Each element in the list is the weighted adjacency matric corresponding to each value in lambda. Each element is a matrix that will be used in NetGSA.

invcov

List of estimated inverse covariance matrix of dimension \(p \times p\).

lambda

List of values of tuning parameters used.

Arguments

x

The \(p \times n\) data matrix with rows referring to genes and columns to samples.

zero

(Optional) indices of entries of the weighted adjacency matrix to be constrained to be zero. The input should be a matrix of \(p \times p\), with 1 at entries to be constrained to be zero and 0 elsewhere. The matrix must be symmetric.

one

(Optional) indices of entries of the weighted adjacency matrix to be kept regardless of the regularization parameter for lasso. The input is similar to that of zero and needs to be symmetric.

lambda

(Non-negative) numeric vector representing the regularization parameters for lasso. Can choose best based on BIC using bic.netEst.undir

rho

(Non-negative) numeric scalar or symmetric \(p \times p\) matrix representing the regularization parameter for estimating the weights in the inverse covariance matrix. This is the same as rho in the graphical lasso algorithm glassoFast.

penalize_diag

Logical. Whether or not to penalize diagonal entries when estimating weighted adjacency matrix. If TRUE a small penalty is used, otherwise no penalty is used.

weight

(Optional) whether to add penalty to known edges. If NULL (default), then the known edges are assumed to be true. If nonzero, then a penalty equal to lambda * weight is added to penalize the known edges to account for possible uncertainty. Only non-negative values are accepted for the weight parameter.

eta

(Non-negative) a small constant added to the diagonal of the empirical covariance matrix of X to ensure it is well conditioned. By default, eta is set to 0.

verbose

Whether to print out information as estimation proceeds. Default = FALSE.

eps

(Non-negative) numeric scalar indicating the tolerance level for differentiating zero and non-zero edges: entries with magnitude \(<\) eps will be set to 0.

Author

Jing Ma & Michael Hellstern

Details

The function netEst.undir performs constrained estimation of sparse inverse covariance (concentration) matrices using a lasso (L1) penalty, as described in Ma, Shojaie and Michailidis (2016). Two sets of constraints determine subsets of entries of the inverse covariance matrix that should be exactly zero (the option zero argument), or should take non-zero values (option one argument). The remaining entries will be estimated from data.

The arguments one and/or zero can come from external knowledge on the 0-1 structure of underlying concentration matrix, such as a list of edges and/or non-edges learned from available databases.

netEst.undir estimates both the support (0-1 structure) of the concentration matrix, or equivalently, the adjacency matrix of the corresponding Gaussian graphical model, for a given tuning parameter, lambda; and the concentration matrix with diagonal entries set to 0, or equivalently, the weighted adjacency matrix. The weighted adjacency matrix is estimated using maximum likelihood based on the estimated support. The parameter rho controls the amount of regularization used in the maximum likelihood step. A small rho is recommended, as a large value of rho may result in too much regularization in the maximum likelihood estimation, thus further penalizing the support of the weighted adjacency matrix. Note this function is suitable only for estimating the adjacency matrix of a undirected graph. The weight parameter allows one to specify whether to penalize the known edges. If known edges obtained from external information contain uncertainty such that some of them are spurious, then it is recommended to use a small positive weight parameter to select the most probable edges from the collection of known ones.

This function is closely related to NetGSA, which requires the weighted adjacency matrix as input. When the user does not have complete information on the weighted adjacency matrix, but has data (x, not necessarily the same as the x in NetGSA) and external information (one and/or zero) on the adjacency matrix, then netEst.undir can be used to estimate the remaining interactions in the adjacency matrix using the data. Further, when it is anticipated that the adjacency matrices under different conditions are different, and data from different conditions are available, the user needs to run netEst.undir separately to obtain estimates of the adjacency matrices under each condition.

The algorithm used in netEst.undir is based on glmnet and glasso. Please refer to glmnet and glasso for computational details.

References

Ma, J., Shojaie, A. & Michailidis, G. (2016) Network-based pathway enrichment analysis with incomplete network information. Bioinformatics 32(20):165--3174. tools:::Rd_expr_doi("10.1093/bioinformatics/btw410")

See Also

prepareAdjMat, bic.netEst.undir, glmnet

Examples

Run this code
library(glassoFast)
library(graphite)
library(igraph)

set.seed(1)

## load the data
data(breastcancer2012_subset)

## consider genes from the "Estrogen signaling pathway" and "Jak-STAT signaling pathway"
genenames <- unique(c(pathways[[25]], pathways[[52]]))
sx <- x[match(genenames, rownames(x)),]
if (sum(is.na(rownames(sx)))>0){
  sx <- sx[-which(is.na(rownames(sx))),]
}
p <- length(genenames)

## zero/one matrices should be based on known non-edges/known edges. Random used as an example
one <- matrix(sample(c(0,1), length(rownames(sx))**2, 
              replace = TRUE, prob = c(0.9, 0.1)), length(rownames(sx)), 
              dimnames = list(rownames(sx), rownames(sx)))

ncond <- length(unique(group))
Amat <- vector("list",ncond)
for (k in 1:ncond){
    data_c <- sx[,(group==k)]
    fitBIC <- bic.netEst.undir(data_c,one=one,
                               lambda=seq(1,10)*sqrt(log(p)/ncol(data_c)),eta=0.1)
    fit <- netEst.undir(data_c,one=one,
                        lambda=which.min(fitBIC$BIC)*sqrt(log(p)/ncol(data_c)),eta=0.1)
    Amat[[k]] <- fit$Adj
}

Run the code above in your browser using DataLab