The function calculates different types of weighted adjacency matrices based on the mutual information between vectors (corresponding to the columns of the input data frame datE). The mutual information between pairs of vectors is divided by an upper bound so that the resulting normalized measure lies between 0 and 1.
mutualInfoAdjacency(
datE,
discretizeColumns = TRUE,
entropyEstimationMethod = "MM",
numberBins = NULL)
The function outputs a list with the following components:
is a vector whose components report entropy estimates of each column of datE
. The natural logarithm (base e) is used in the definition. Using the notation from the Wikipedia entry (http://en.wikipedia.org/wiki/Mutual_information), this vector contains the values Hx where x corresponds to a column in datE
.
is a symmetric matrix whose entries contain the pairwise mutual information
measures between the columns of datE
. The diagonal of the matrix MutualInformation
equals
Entropy
. In general, the entries of this matrix can be larger than 1, i.e. this is not an adjacency
matrix. Using the notation from the Wikipedia entry, this matrix contains the mutual information estimates
I(X;Y)
is a weighted adjacency matrix whose entries are based on the mutual
information. Using the notation from the Wikipedia entry, this matrix contains the mutual information
estimates AdjacencySymmetricUncertainty
=2*I(X;Y)/(H(X)+H(Y)). Since I(X;X)=H(X), the diagonal
elements of AdjacencySymmetricUncertainty
equal 1. In general the entries of this symmetric matrix
AdjacencySymmetricUncertainty
lie between 0 and 1.
is a weighted adjacency matrix that is a simple function of the
AdjacencySymmetricUncertainty
. Specifically, AdjacencyUniversalVersion1=
AdjacencySymmetricUncertainty/(2- AdjacencySymmetricUncertainty)
. Note that f(x)= x/(2-x) is a
monotonically increasing function on the unit interval [0,1] whose values lie between 0 and 1. The reason
why we call it the universal adjacency is that dissUA=1-AdjacencyUniversalVersion1
turns out to be
a universal distance function, i.e. it satisfies the properties of a distance (including the triangle
inequality) and it takes on a small value if any other distance measure takes on a small value (Kraskov et
al 2003).
is a weighted adjacency matrix for which dissUAversion2=1-AdjacencyUniversalVersion2
is also a universal distance measure. Using the notation from Wikipedia, the entries of the symmetric matrix AdjacencyUniversalVersion2 are defined as follows
AdjacencyUniversalVersion2
=I(X;Y)/max(H(X),H(Y)).
datE
is a data frame or matrix whose columns correspond to variables and whose rows correspond to measurements. For example, the columns may correspond to genes while the rows correspond to microarrays. The number of nodes in the mutual information network equals the number of columns of datE
.
is a logical variable. If it is set to TRUE then the columns of datE
will be discretized into a user-defined number of bins (see numberBins
).
takes a text string for specifying the entropy and mutual information estimation method. If entropyEstimationMethod="MM"
then the Miller-Madow asymptotic bias corrected empirical estimator is used.
If entropyEstimationMethod="ML"
the maximum likelihood estimator (also known as plug-in or empirical estimator) is used.
If entropyEstimationMethod="shrink"
, the shrinkage estimator of a Dirichlet probability distribution is used.
If entropyEstimationMethod="SG"
, the Schurmann-Grassberger estimator of the entropy of a Dirichlet probability distribution is used.
is an integer larger than 0 which specifies how many bins are used for the discretization step. This argument is only relevant if discretizeColumns
has been set to TRUE. By default numberBins
is set to sqrt(m) where m is the number of samples, i.e. the number of rows of datE
. Thus the default is numberBins
=sqrt(nrow(datE)).
Steve Horvath, Lin Song, Peter Langfelder
The function inputs a data frame datE
and outputs a list whose components correspond to different weighted network adjacency measures defined beteween the columns of datE
. Make sure to install the following R packages entropy
, minet
, infotheo
since
the function mutualInfoAdjacency
makes use of the entropy
function from the R package entropy
(Hausser and Strimmer 2008) and functions from the minet
and infotheo
package (Meyer et al 2008).
A weighted network adjacency matrix is a symmetric matrix whose entries take on values between 0 and 1. Each weighted adjacency matrix contains scaled versions of the mutual information between the columns of the input data frame datE
.
We assume that datE contains numeric values which will be discretized unless the user chooses the option discretizeColumns=FALSE
.
The raw (unscaled) mutual information and entropy measures have units "nat", i.e. natural logarithms are used in their definition (base e=2.71..).
Several mutual information estimation methods have been proposed in the literature (reviewed in Hausser and Strimmer 2008, Meyer et al 2008).
While mutual information networks allows one to detect non-linear relationships between the columns of datE
, they may overfit the data if relatively few observations are available. Thus, if the number of rows of datE
is smaller than say 200, it may be better to fit a correlation using the function adjacency
.
Hausser J, Strimmer K (2008) Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks. See http://arxiv.org/abs/0811.3579
Patrick E. Meyer, Frederic Lafitte, and Gianluca Bontempi. minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information. BMC Bioinformatics, Vol 9, 2008
Kraskov A, Stoegbauer H, Andrzejak RG, Grassberger P (2003) Hierarchical Clustering Based on Mutual Information. ArXiv q-bio/0311039
adjacency
# Load requisite packages. These packages are considered "optional",
# so WGCNA does not load them automatically.
if (require(infotheo, quietly = TRUE) &&
require(minet, quietly = TRUE) &&
require(entropy, quietly = TRUE))
{
# Example can be executed.
#Simulate a data frame datE which contains 5 columns and 50 observations
m=50
x1=rnorm(m)
r=.5; x2=r*x1+sqrt(1-r^2)*rnorm(m)
r=.3; x3=r*(x1-.5)^2+sqrt(1-r^2)*rnorm(m)
x4=rnorm(m)
r=.3; x5=r*x4+sqrt(1-r^2)*rnorm(m)
datE=data.frame(x1,x2,x3,x4,x5)
#calculate entropy, mutual information matrix and weighted adjacency
# matrices based on mutual information.
MIadj=mutualInfoAdjacency(datE=datE)
} else
printFlush(paste("Please install packages infotheo, minet and entropy",
"before running this example."));
Run the code above in your browser using DataLab