This functions calculates various network concepts (topological properties, network indices) of a network calculated from expression data. See details for a detailed description.
networkConcepts(datExpr, power = 1, trait = NULL, networkType = "unsigned")
a data frame containg the expression data, with rows corresponding to samples and columns to genes (nodes).
soft thresholding power.
optional specification of a sample trait. A vector of length equal the number of samples
in datExpr
.
network type. Recognized values are (unique abbreviations of) "unsigned"
,
"signed"
, and "signed hybrid"
.
A list with the following components:
a data frame whose rows report network concepts that only depend on the adjacency matrix. Density (mean adjacency), Centralization , Heterogeneity (coefficient of variation of the connectivity), Mean ClusterCoef, Mean Connectivity. The columns of the data frame report the 4 types of network concepts mentioned in the description: Fundamental concepts, eigengene-based concepts, conformity-based concepts, and approximate conformity-based concepts.
reports the network size, i.e. the number of nodes, which equals the number of columns of
the input data frame datExpr
.
a number between 0 and 1. The closer it is to 1, the better the off-diagonal
elements of the conformity based network A.CF
approximate those of A
(according to the Frobenius norm).
the first principal component of the standardized columns of datExpr
. The
number of
components of this vector equals the number of rows of datExpr
.
the proportion of variance explained by the first principal component (the
Eigengene
). It is numerically different from the eigengene based factorizability.
While VarExplained
is
based on the squares of the singular values of datExpr
,
the eigengene-based factorizability is based on
fourth powers of the singular values.
numerical vector giving the conformity.
The number of components of the conformity vector equals the number of columns in
datExpr
. The conformity is often highly correlated with the vector of node connectivities. The
conformity is computed using an iterative algorithm for maximizing the factorizability measure. The
algorithm and related network concepts are described in Dong and Horvath 2007.
a numerical vector that reports the cluster coefficient for each node. This fundamental network concept measures the cliquishness of each node.
a numerical vector that reports the connectivity (also known as degree) of each
node. This fundamental network concept is also known as whole network connectivity. One can also define
the scaled connectivity K=Connectivity/max(Connectivity)
which is used for computing the hub gene
significance.
a numerical vector that reports the maximum adjacency ratio for each node. MAR[i]
equals 1
if all non-zero adjacencies between node i
and the remaining network nodes equal 1. This
fundamental
network concept is always 1 for nodes of an unweighted network. This is a useful measure for weighted
networks since it allows one to determine whether a node has high connectivity because of many weak
connections (small MAR) or because of strong (but few) connections (high MAR), see Horvath and Dong 2008.
a numerical vector that reports the eigengene based (aka eigenenode based)
conformity for the correlation network. The number of components equals the number of columns of
datExpr
.
a numerical vector that encodes the node (gene) significance. The i-th component equals the
node significance of the i-th column of datExpr
if a sample trait was supplied to the function
(input
trait). GS[i]=abs(cor(datE[,i], trait, use="p"))^power
a numerical vector that reports the eigengene based gene significance measure. Its i-th
component is given by GSE[i]=ConformityE[i]*EigengeneSignificance
where the eigengene significance
abs(cor(Eigengene,trait))^power
is defined as power of the absolute value of the correlation
between
eigengene and trait.
a data frame whose rows report network concepts that also depend on the trait based
node significance measure. The rows correspond to network concepts and the columns correspond to the type
of network concept (fundamental versus eigengene based). The first row of the data frame reports the
network significance. The fundamental version of this network concepts is the average gene
significance=mean(GS). The eigengene based analog of this concept is defined as mean(GSE). The second row
reports the hub gene significance which is defined as slope of the intercept only regression model that
regresses the gene significance on the scaled network connectivity K. The third row reports the eigengene
significance abs(cor(Eigengene,trait))^power
. More details can be found in Horvath and Dong
(2008).
This function computes various network concepts (also known as network statistics, topological
properties, or network indices) for a weighted correlation network. The nodes of the weighted correlation
network will be constructed between the columns (interpreted as nodes) of the input datExpr
.
If the option
networkType="unsigned"
then the adjacency between nodes i and j is defined as
A[i,j]=abs(cor(datExpr[,i],datExpr[,j]))^power
.
In the following, we use the term gene and node interchangeably since these methods were originally
developed for gene networks. The function computes the following
4 types of network concepts (introduced in Horvath and Dong 2008):
Type I: fundamental network concepts are defined as a function of the off-diagonal elements of an
adjacency matrix A and/or a node significance measure GS. These network concepts can be defined for any
network (not just correlation networks).
The adjacency matrix of an unsigned weighted correlation network is given by
A=abs(cor(datExpr,use="p"))^power
and the trait based gene significance measure is given by
GS= abs(cor(datExpr,trait, use="p"))^power
where datExpr
, trait
, power
are input parameters.
Type II: conformity-based network concepts are functions of the off-diagonal elements of the conformity
based adjacency matrix A.CF=CF*t(CF)
and/or the node significance measure. These network concepts
are
defined for any network for which a conformity vector can be defined. Details: For any adjacency matrix
A
, the conformity vector CF
is calculated by requiring that A[i,j]
is
approximately equal to CF[i]*CF[j]
.
Using the conformity one can define the matrix A.CF=CF*t(CF)
which is the outer product of
the conformity
vector with itself. In general, A.CF
is not an adjacency matrix since its diagonal elements
are different
from 1. If the off-diagonal elements of A.CF
are similar to those of A
according to the Frobenius matrix
norm, then A
is approximately factorizable. To measure the factorizability of a network, one can
calculate the Factorizability
, which is a number between 0 and 1 (Dong and Horvath 2007). T
he conformity
is defined using a monotonic, iterative algorithm that maximizes the factorizability measure.
Type III: approximate conformity based network concepts are functions of all elements of the conformity
based adjacency matrix A.CF
(including the diagonal) and/or the node significance measure
GS
. These
network concepts are very useful for deriving relationships between network concepts in networks that are
approximately factorizable.
Type IV: eigengene-based (also known as eigennode-based) network concepts are functions of the
eigengene-based adjacency matrix A.E=ConformityE*t(ConformityE)
(diagonal included) and/or the
corresponding eigengene-based gene significance measure GSE
. These network concepts can only be
defined
for correlation networks. Details: The columns (nodes) of datExpr
can be summarized with the
first principal
component, which is referred to as Eigengene in coexpression network analysis. In general correlation
networks, it is called eigennode. The eigengene-based conformity ConformityE[i]
is defined as
abs(cor(datE[,i], Eigengene))^power
where the power corresponds to the power used for defining the
weighted adjacency matrix A
. The eigengene-based conformity can also be used to define an
eigengene-based
adjacency matrix A.E=ConformityE*t(ConformityE)
.
The eigengene based factorizability EF(datE)
is a number between 0 and 1 that measures how well
A.E
approximates A
when the power parameter equals 1. EF(datE)
is defined with respect to the
singular values
of datExpr
. For a trait based node significance measure GS=abs(cor(datE,trait))^power
,
one can also define
an eigengene-based node significance measure GSE[i]=ConformityE[i]*EigengeneSignificance
where the
eigengene significance abs(cor(Eigengene,trait))^power
is defined as power of the absolute value
of the
correlation between eigengene and trait.
Eigengene-based network concepts are very useful for providing a geometric interpretation of network
concepts and for deriving relationships between network concepts. For example, the hub gene significance
measure and its eigengene-based analog have been used to characterize networks where highly connected hub
genes are important with regard to a trait based gene significance measure (Horvath and Dong 2008).
Bin Zhang and Steve Horvath (2005) "A General Framework for Weighted Gene Co-Expression Network Analysis", Statistical Applications in Genetics and Molecular Biology: Vol. 4: No. 1, Article 17
Dong J, Horvath S (2007) Understanding Network Concepts in Modules, BMC Systems Biology 2007, 1:24
Horvath S, Dong J (2008) Geometric Interpretation of Gene Coexpression Network Analysis. PLoS Comput Biol 4(8): e1000117
conformityBasedNetworkConcepts
for approximate conformity-based network concepts
fundamentalNetworkConcepts
for calculation of fundamental network concepts only.