reachability: Find the Reachability Matrix of a Graph

Description

reachability takes one or more (possibly directed) graphs as input, producing the associated reachability matrices.

Usage

reachability(dat, geodist.precomp=NULL, return.as.edgelist=FALSE, na.omit=TRUE)

Value

A reachability matrix, or the equivalent edgelist representation

Arguments

dat: one or more graphs (directed or otherwise).
geodist.precomp: optionally, a precomputed geodist object.
return.as.edgelist: logical; return the result as an sna edgelist?
na.omit: logical; omit missing edges when computing reach?

Author

Carter T. Butts buttsc@uci.edu

Details

For a digraph $G=(V,E)$ with vertices $i$ and $j$, let $P_{ij}$ represent a directed $ij$ path. Then the (di)graph

$$ R = \left(V\left(G\right),\left\{\left(i,j\right):i,j \in V\left(G\right), P_{ij} \in G\right\}\right)$$

is said to be the reachability graph of $G$, and the adjacency matrix of $R$ is said to be $G$'s reachability matrix. (Note that when $G$ is undirected, we simply take each undirected edge to be bidirectional.) Vertices which are adjacent in the reachability graph are connected by one or more directed paths in the original graph; thus, structural equivalence classes in the reachability graph are synonymous with strongly connected components in the original structure.

Bear in mind that -- as with all matters involving connectedness -- reachability is strongly related to size and density. Since, for any given density, almost all structures of sufficiently large size are connected, reachability graphs associated with large structures will generally be complete. Measures based on the reachability graph, then, will tend to become degenerate in the large $|V(G)|$ limit (assuming constant positive density).

By default, reachability will try to build the reachability graph using an internal sparse graph approximation; this is no help on fully connected graphs (but not a lot worse than using an adjacency matrix), but will result in considerable savings for large graphs that are heavily fragmented. (The intended design tradeoff is thus that one pays a small cost on the usually cheap cases, in exchange for much greater efficiency on the cases that would otherwise be prohibitively expensive.) If geodist.precomp is given, however, the $O(N^2)$ cost of an adjacency matrix representation has already been paid, and we simply employ what we are given -- so, if you want to force the internal use of adjacency matrices, just pass a geodist object. Because the internal representation used is otherwise list based, using return.as.edgelist=TRUE will save resources; if you are using reachability as part of a more complex series of calls, it is thus recommended that you both pass and return sna edgelists unless you have a good reason not to do so.

When set, na.omit results in missing edges (i.e., edges with NA values) being removed prior to computation. Since paths are not recomputed when geodist.precomp is passed, this option is only active when geodist.precomp==NULL; if this behavior is desired and precomputed distances are being used, such edges should be removed prior to the geodist call.

References

Wasserman, S., and Faust, K. (1994). Social Network Analysis: Methods and Applications. Cambridge: Cambridge University Press.

Examples

Run this code

#Find the reachability matrix for a sparse random graph
g<-rgraph(10,tprob=0.15)
rg<-reachability(g)
g  #Compare the two structures
rg

#Compare to the output of geodist
all(rg==(geodist(g)$counts>0))

Run the code above in your browser using DataLab