reachability
takes one or more (possibly directed) graphs as input, producing the associated reachability matrices.
reachability(dat, geodist.precomp=NULL, return.as.edgelist=FALSE, na.omit=TRUE)
A reachability matrix, or the equivalent edgelist representation
one or more graphs (directed or otherwise).
optionally, a precomputed geodist
object.
logical; return the result as an sna edgelist?
logical; omit missing edges when computing reach?
Carter T. Butts buttsc@uci.edu
For a digraph \(G=(V,E)\) with vertices \(i\) and \(j\), let \(P_{ij}\) represent a directed \(ij\) path. Then the (di)graph
$$ R = \left(V\left(G\right),\left\{\left(i,j\right):i,j \in V\left(G\right), P_{ij} \in G\right\}\right)$$
is said to be the reachability graph of \(G\), and the adjacency matrix of \(R\) is said to be \(G\)'s reachability matrix. (Note that when \(G\) is undirected, we simply take each undirected edge to be bidirectional.) Vertices which are adjacent in the reachability graph are connected by one or more directed paths in the original graph; thus, structural equivalence classes in the reachability graph are synonymous with strongly connected components in the original structure.
Bear in mind that -- as with all matters involving connectedness -- reachability is strongly related to size and density. Since, for any given density, almost all structures of sufficiently large size are connected, reachability graphs associated with large structures will generally be complete. Measures based on the reachability graph, then, will tend to become degenerate in the large \(|V(G)|\) limit (assuming constant positive density).
By default, reachability
will try to build the reachability graph using an internal sparse graph approximation; this is no help on fully connected graphs (but not a lot worse than using an adjacency matrix), but will result in considerable savings for large graphs that are heavily fragmented. (The intended design tradeoff is thus that one pays a small cost on the usually cheap cases, in exchange for much greater efficiency on the cases that would otherwise be prohibitively expensive.) If geodist.precomp
is given, however, the \(O(N^2)\) cost of an adjacency matrix representation has already been paid, and we simply employ what we are given -- so, if you want to force the internal use of adjacency matrices, just pass a geodist
object. Because the internal representation used is otherwise list based, using return.as.edgelist=TRUE
will save resources; if you are using reachability
as part of a more complex series of calls, it is thus recommended that you both pass and return sna edgelists unless you have a good reason not to do so.
When set, na.omit
results in missing edges (i.e., edges with NA
values) being removed prior to computation. Since paths are not recomputed when geodist.precomp
is passed, this option is only active when geodist.precomp==NULL
; if this behavior is desired and precomputed distances are being used, such edges should be removed prior to the geodist
call.
Wasserman, S., and Faust, K. (1994). Social Network Analysis: Methods and Applications. Cambridge: Cambridge University Press.
geodist
#Find the reachability matrix for a sparse random graph
g<-rgraph(10,tprob=0.15)
rg<-reachability(g)
g #Compare the two structures
rg
#Compare to the output of geodist
all(rg==(geodist(g)$counts>0))
Run the code above in your browser using DataLab