plotModule: Plot the topology of a network module

Description

Plot the correlation structure, network edges, scaled weighted degree, node contribtuion, module data, and module summary vectors of one or more network modules.

Individual components of the module plot can be plotted using plotCorrelation, plotNetwork, plotDegree, plotContribution, plotData, and plotSummary.

Usage

plotModule(
  network,
  data,
  correlation,
  moduleAssignments = NULL,
  modules = NULL,
  backgroundLabel = "0",
  discovery = NULL,
  test = NULL,
  verbose = TRUE,
  orderSamplesBy = NULL,
  orderNodesBy = NULL,
  orderModules = TRUE,
  plotNodeNames = TRUE,
  plotSampleNames = TRUE,
  plotModuleNames = NULL,
  main = "Module Topology",
  main.line = 1,
  drawBorders = FALSE,
  lwd = 1,
  naxt.line = -0.5,
  saxt.line = -0.5,
  maxt.line = NULL,
  xaxt.line = -0.5,
  xaxt.tck = -0.025,
  xlab.line = 2.5,
  yaxt.line = 0,
  yaxt.tck = -0.15,
  ylab.line = 2.5,
  laxt.line = 2.5,
  laxt.tck = 0.04,
  cex.axis = 0.8,
  legend.main.line = 1.5,
  cex.lab = 1.2,
  cex.main = 2,
  dataCols = NULL,
  dataRange = NULL,
  corCols = correlation.palette(),
  corRange = c(-1, 1),
  netCols = network.palette(),
  netRange = c(0, 1),
  degreeCol = "#feb24c",
  contribCols = c("#A50026", "#313695"),
  summaryCols = c("#1B7837", "#762A83"),
  naCol = "#bdbdbd",
  dryRun = FALSE
)

Arguments

network: a list of interaction networks, one for each dataset. Each entry of the list should be a \(n * n\) matrix or where each element contains the edge weight between nodes \(i\) and \(j\) in the inferred network for that dataset.
data: a list of matrices, one for each dataset. Each entry of the list should be the data used to infer the interaction network for that dataset. The columns should correspond to variables in the data (nodes in the network) and rows to samples in that dataset.
correlation: a list of matrices, one for each dataset. Each entry of the list should be a \(n * n\) matrix where each element contains the correlation coefficient between nodes \(i\) and \(j\) in the data used to infer the interaction network for that dataset.
moduleAssignments: a list of vectors, one for each discovery dataset, containing the module assignments for each node in that dataset.
modules: a list of vectors, one for each discovery dataset, of modules to perform the analysis on. If unspecified, all modules in each discovery dataset will be analysed, with the exception of those specified in backgroundLabel argument.
backgroundLabel: a single label given to nodes that do not belong to any module in the moduleAssignments argument. Defaults to "0". Set to NULL if you do not want to skip the network background module.
discovery: a vector of names or indices denoting the discovery dataset(s) in the data, correlation, network, moduleAssignments, modules, and test lists.
test: a list of vectors, one for each discovery dataset, of names or indices denoting the test dataset(s) in the data, correlation, and network lists.
verbose: logical; should progress be reported? Default is TRUE.
orderSamplesBy: NULL (default), NA, or a vector containing a single dataset name or index. Controls how samples are ordered on the plot (see details).
orderNodesBy: NULL (default), NA, or a vector of dataset names or indices. Controls how nodes are ordered on the plot (see details).
orderModules: logical; if TRUE modules ordered by clustering their summary vectors. If FALSE modules are returned in the order provided.
plotNodeNames: logical; controls whether the node names are drawed on the bottom axis.
plotSampleNames: logical; controls whether the sample names are drawed on the left axis.
plotModuleNames: logical; controls whether module names are drawed. The default is for module names to be drawed when multiple modules are drawn.
main: title for the plot.
main.line: the number of lines into the top margin at which the plot title will be drawn.
drawBorders: logical; if TRUE, borders are drawn around the weighted degree, node conribution, and module summary bar plots.
lwd: line width for borders and axes.
naxt.line: the number of lines into the bottom margin at which the node names will be drawn.
saxt.line: the number of lines into the left margin at which the sample names will be drawn.
maxt.line: the number of lines into the bottom margin at which the module names will be drawn.
xaxt.line: the number of lines into the bottom margin at which the x-axis tick labels will be drawn on the module summary bar plot.
xaxt.tck: the size of the x-axis ticks for the module summary bar plot.
xlab.line: the number of lines into the bottom margin at which the x axis label on the module summary bar plot(s) will be drawn.
yaxt.line: the number of lines into the left margin at which the y-axis tick labels will be drawn on the weighted degree and node contribution bar plots.
yaxt.tck: the size of the y-axis ticks for the weighted degree and node contribution bar plots.
ylab.line: the number of lines into the left margin at which the y axis labels on the weighted degree and node contribution bar plots will be drawn.
laxt.line: the distance from the legend to draw the legend axis labels, as multiple of laxt.tck.
laxt.tck: size of the ticks on each axis legend relative to the size of the correlation, edge weights, and data matrix heatmaps.
cex.axis: relative size of the node and sample names.
legend.main.line: the distance from the legend to draw the legend title.
cex.lab: relative size of the module names and legend titles.
cex.main: relative size of the plot titles.
dataCols: a character vector of colors to create a gradient from for the data heatmap (see details). Automatically determined if NA or NULL.
dataRange: the range of values to map to the dataCols gradient (see details). Automatically determined if NA or NULL.
corCols: a character vector of colors to create a gradient from for the correlation structure heatmap (see details).
corRange: the range of values to map to the corCols gradient (see details).
netCols: a character vector of colors to create a gradient from for the network edge weight heatmap (see details).
netRange: the range of values to map to the corCols gradient (see details). Automatically determined if NA or NULL.
degreeCol: color to use for the weighted degree bar plot.
contribCols: color(s) to use for the node contribution bar plot (see details).
summaryCols: color(s) to use for the node contribution bar plot (see details).
naCol: color to use for missing nodes and samples on the data, correlation structure, and network edge weight heat maps.
dryRun: logical; if TRUE, only the axes and labels will be drawed.

Details

Input data structures:

The preservation of network modules in a second dataset is quantified by measuring the preservation of topological properties between the discovery and test datasets. These properties are calculated not only from the interaction networks inferred in each dataset, but also from the data used to infer those networks (e.g. gene expression data) as well as the correlation structure between variables/nodes. Thus, all functions in the NetRep package have the following arguments:

network: a list of interaction networks, one for each dataset.
data: a list of data matrices used to infer those networks, one for each dataset.
correlation: a list of matrices containing the pairwise correlation coefficients between variables/nodes in each dataset.
moduleAssignments: a list of vectors, one for each discovery dataset, containing the module assignments for each node in that dataset.
modules: a list of vectors, one for each discovery dataset, containing the names of the modules from that dataset to analyse.
discovery: a vector indicating the names or indices of the previous arguments' lists to use as the discovery dataset(s) for the analyses.
test: a list of vectors, one vector for each discovery dataset, containing the names or indices of the network, data, and correlation argument lists to use as the test dataset(s) for the analysis of each discovery dataset.

The formatting of these arguments is not strict: each function will attempt to make sense of the user input. For example, if there is only one discovery dataset, then input to the moduleAssigments and test arguments may be vectors, rather than lists. If the node and sample ordering is being calculated within the same dataset being visualised, then the discovery and test arguments do not need to be specified, and the input matrices for the network, data, and correlation arguments do not need to be wrapped in a list.

Analysing large datasets:

Matrices in the network, data, and correlation lists can be supplied as disk.matrix objects. This class allows matrix data to be kept on disk and loaded as required by NetRep. This dramatically decreases memory usage: the matrices for only one dataset will be kept in RAM at any point in time.

Node, sample, and module ordering:

By default, nodes are ordered in decreasing order of weighted degree in the discovery dataset (see nodeOrder). Missing nodes are colored in grey. This facilitates the visual comparison of modules across datasets, as the node ordering will be preserved.

Alternatively, a vector containing the names or indices of one or more datasets can be provided to the orderNodesBy argument.

If a single dataset is provided, then nodes will be ordered in decreasing order of weighted degree in that dataset. Only nodes that are present in this dataset will be drawn when ordering nodes by a dataset that is not the discovery dataset for the requested modules(s).

If multiple datasets are provided then the weighted degree will be averaged across these datasets (see nodeOrder for more details). This is useful for obtaining a robust ordering of nodes by relative importance, assuming the modules displayed are preserved in those datasets.

Ordering of nodes by weighted degree can be suppressed by setting orderNodesBy to NA, in which case nodes will be ordered as in the matrices provided in the network, data, and correlation arguments.

When multiple modules are drawn, modules are ordered by the similarity of their summary vectors in the dataset(s) specified in orderNodesBy argument. If multiple datasets are provided to the orderNodesBy argument then the module summary vectors are concatenated across datasets.

By default, samples in the data heatmap and accompanying module summary bar plot are ordered in descending order of module summary in the drawn dataset (specified by the test argument). If multiple modules are drawn, samples are ordered as per the left-most module on the plot.

Alternatively, a vector containing the name or index of another dataset may be provided to the orderSamplesBy argument. In this case, samples will be ordered in descending order of module summary in the specified dataset. This is useful when comparing different measurements across samples, for example, gene expression data obtained from multiple tissues samples across the same individuals. If the dataset specified is the discovery dataset, then missing samples will be displayed as horizontal grey bars. If the dataset specified is one of the other datasets, samples present in both the specified dataset and the test dataset will be displayed first in order of the specified dataset, then samples present in only the test dataset will be displayed underneath a horizontal black line ordered by their module summary vector in the test dataset.

Order of samples by module summary can be suppressed by setting orderSamplesBy to NA, in which case samples will be order as in the matrix provided to the data argument for the drawn dataset.

Weighted degree scaling:

When drawn on a plot, the weighted degree of each node is scaled to the maximum weighted degree within its module. The scaled weighted degree is measure of relative importance for each node to its module. This makes visualisation of multiple modules with different sizes and densities possible. However, the scaled weighted degree should only be interpreted for groups of nodes that have an apparent module structure.

Plot layout and device size

For optimal results we recommend viewing single modules on a PNG device with a width of 1500, a height of 2700 and a nominal resolution of 300 (png(filename, width=5*300, height=9*300, res=300))).

Warning: PDF and other vectorized devices should not be used when plotting more than a hundred nodes. Large files will be generated which may cause image editing programs such as Inkscape or Illustrator to crash when polishing figures for publication.

When dryRun is TRUE only the axes, legends, labels, and title will be drawn, allowing for quick iteration of customisable parameters to get the plot layout correct.

If axis labels or legends are drawn off screen then the margins of the plot should be adjusted prior to plotting using the par command to increase the margin size (see the "mar" option in the par help page).

The size of text labels can be modified by increasing or decreasing the cex.main, cex.lab, and cex.axis arguments:

cex.main: controls the size of the plot title (specified in the main argument).
cex.lab: controls the size of the axis labels on the weighted degree, node contribution, and module summary bar plots as well as the size of the module labels and the heatmap legend titles.
cex.axis: contols the size of the axis tick labels, including the node and sample labels.

The position of these labels can be changed through the following arguments:

xaxt.line: controls the distance from the plot the x-axis tick labels are drawn on the module summary bar plot.
xlab.line: controls the distance from the plot the x-axis label is drawn on the module summary bar plot.
yaxt.line: controls the distance from the plot the y-axis tick labels are drawn on the weighted degree and node contribution bar plots.
ylab.line: controls the distance from the plot the y-axis label is drawn on the weighted degree and node contribution bar plots.
main.line: controls the distance from the plot the title is drawn.
naxt.line: controls the distance from the plot the node labels are drawn.
saxt.line: controls the distance from the plot the sample labels are drawn.
maxt.line: controls the distance from the plot the module labels are drawn.
laxt.line: controls the distance from the heatmap legends that the gradient legend labels are drawn.
legend.main.line: controls the distance from the heatmap legends that the legend title is drawn.

The rendering of node, sample, and module names can be disabled by setting plotNodeNames, plotSampleNames, and plotModuleNames to FALSE.

The size of the axis ticks can be changed by increasing or decreasing the following arguments:

xaxt.tck: size of the x-axis tick labels as a multiple of the height of the module summary bar plot
yaxt.tck: size of the y-axis tick labels as a multiple of the width of the weighted degree or node contribution bar plots.
laxt.tck: size of the heatmap legend axis ticks as a multiple of the width of the data, correlation structure, or network edge weight heatmaps.

The drawBorders argument controls whether borders are drawn around the weighted degree, node contribution, or module summary bar plots. The lwd argument controls the thickness of these borders, as well as the thickness of axes and axis ticks.

Modifying the color palettes:

The dataCols and dataRange arguments control the appearance of the data heatmap (see plotData). The gradient of colors used on the heatmap can be changed by specifying a vector of colors to interpolate between in dataCols and dataRange specifies the range of values that maps to this gradient. Values outside of the specified dataRange will be rendered with the colors used at either extreme of the gradient. The default gradient is determined based on the data shown on the plot. If all values in the data matrix are positive, then the gradient is interpolated between white and green, where white is used for the smallest value and green for the largest. If all values are negative, then the gradient is interpolated between purple and white, where purple is used for the smallest value and white for the value closest to zero. If the data contains both positive and negative values, then the gradient is interpolated between purple, white, and green, where white is used for values of zero. In this case the range shown is always centered at zero, with the values at either extreme determined by the value in the rendered data with the strongest magnitude (the maximum of the absolute value).

The corCols and corRange arguments control the appearance of the correlation structure heatmap (see plotCorrelation). The gradient of colors used on the heatmap can be changed by specifying a vector of colors to interpolate between in corCols. By default, strong negative correlations are shown in blue, and strong positive correlations in red, and weak correlations as white. corRange controls the range of values that this gradient maps to, by default, -1 to 1. Changing this may be useful for showing differences where range of correlation coefficients is small.

The netCols and netRange arguments control the appearance of the network edge weight heatmap (see plotNetwork). The gradient of colors used on the heatmap can be changed by specifying a vector of colors to interpolate between in netCols. By default, weak or non-edges are shown in white, while strong edges are shown in red. The netRange controls the range of values this gradient maps to, by default, 0 to 1. If netRange is set to NA, then the gradient will be mapped to values between 0 and the maximum edge weight of the shown network.

The degreeCol argument controls the color of the weighted degree bar plot (see plotDegree).

The contribCols argument controls the color of the node contribution bar plot (see plotContribution. This can be specified as single value to be used for all nodes, or as two colors: one to use for nodes with positive contributions and one to use for nodes with negative contributions.

The summaryCols argument controls the color of the module summary bar plot (see plotSummary. This can be specified as single value to be used for all samples, or as two colors: one to use for samples with a positive module summary value and one fpr samples with a negative module summary value.

The naCol argument controls the color of missing nodes and samples on the data, correlaton structure, and network edge weight heatmaps.

Embedding in Rmarkdown documents

The chunk option fig.keep="last" should be set to avoid an empty plot being embedded above the plot generated by plotModule. This empty plot is generated so that an error will be thrown as early as possible if the margins are too small to be displayed. Normally, these are drawn over with the actual plot components when drawing the plot on other graphical devices.

Examples

Run this code

# load in example data, correlation, and network matrices for a discovery 
# and test dataset:
data("NetRep")

# Set up input lists for each input matrix type across datasets. The list
# elements can have any names, so long as they are consistent between the
# inputs.
network_list <- list(discovery=discovery_network, test=test_network)
data_list <- list(discovery=discovery_data, test=test_data)
correlation_list <- list(discovery=discovery_correlation, test=test_correlation)
labels_list <- list(discovery=module_labels)

# Plot module 1, 2 and 4 in the discovery dataset
plotModule(
  network=network_list, data=data_list, correlation=correlation_list, 
  moduleAssignments=labels_list, modules=c(1, 2, 4)
)

# Now plot them in the test dataset (module 2 does not replicate)
plotModule(
  network=network_list,data=data_list, correlation=correlation_list,
  moduleAssignments=labels_list, modules=c(1, 2, 4), discovery="discovery",
  test="test"
)

# Plot modules 1 and 4, which replicate, in the test datset ordering nodes
# by weighted degree averaged across the two datasets
plotModule(
  network=network_list, data=data_list, correlation=correlation_list, 
  moduleAssignments=labels_list, modules=c(1, 4), discovery="discovery",
  test="test", orderNodesBy=c("discovery", "test")
)

Run the code above in your browser using DataLab