nacf
computes the sample network covariance/correlation function for a specified variable on a given input network. Moran's \(I\) and Geary's \(C\) statistics at multiple orders may be computed as well.
nacf(net, y, lag.max = NULL, type = c("correlation", "covariance",
"moran", "geary"), neighborhood.type = c("in", "out", "total"),
partial.neighborhood = TRUE, mode = "digraph", diag = FALSE,
thresh = 0, demean = TRUE)
A vector containing the dependence statistics (ascending from order 0).
one or more graphs.
a numerical vector, of length equal to the order of net
.
optionally, the maximum geodesic lag at which to compute dependence (defaults to order net
-1).
the type of dependence statistic to be computed.
the type of neighborhood to be employed when assessing dependence (as per neighborhood
).
logical; should partial (rather than cumulative) neighborhoods be employed at higher orders?
"digraph"
for directed graphs, or "graph"
if net
is undirected.
logical; does the diagonal of net
contain valid data?
threshold at which to dichotomize net
.
logical; demean y
prior to analysis?
Carter T. Butts buttsc@uci.edu
nacf
computes dependence statistics for the vector y
on network net
, for neighborhoods of various orders. Specifically, let \(\mathbf{A}_i\) be the \(i\)th order adjacency matrix of net
. The sample network autocovariance of \(\mathbf{y}\) on \(\mathbf{A}_i\) is then given by
$$
\sigma_i = \frac{\mathbf{y}^T \mathbf{A}_i \mathbf{y}}{E},
$$
where \(E=\sum_{(j,k)}A_{ijk}\). Similarly, the sample network autocorrelation in the above case is \(\rho_i=\sigma_i/\sigma_0\), where \(\sigma_0\) is the variance of \(y\). Moran's \(I\) and Geary's \(C\) statistics are defined in the usual fashion as
$$
I_i = \frac{N \sum_{j=1}^N \sum_{k=1}^N (y_j-\bar{y}) (y_k-\bar{y}) A_{ijk}}{E \sum_{j=1}^N y_j^2},
$$
and
$$
C_i = \frac{(N-1) \sum_{j=1}^N \sum_{k=1}^N (y_j-y_k)^2 A_{ijk}}{2 E \sum_{j=1}^N (y-\bar{y})^2}
$$
respectively, where \(N\) is the order of \(\mathbf{A}_i\) and \(\bar{y}\) is the mean of \(\mathbf{y}\).
The adjacency matrix associated with the \(i\)th order neighborhood is defined as the identity matrix for order 0, and otherwise depends on the type of neighborhood involved. For input graph \(G=(V,E)\), let the base relation, \(R\), be given by the underlying graph of \(G\) (i.e., \(G \cup G^T\)) if total neighborhoods are sought, the transpose of \(G\) if incoming neighborhoods are sought, or \(G\) otherwise. The partial neighborhood structure of order \(i>0\) on \(R\) is then defined to be the digraph on \(V\) whose edge set consists of the ordered pairs \((j,k)\) having geodesic distance \(i\) in \(R\). The corresponding cumulative neighborhood is formed by the ordered pairs having geodesic distance less than or equal to \(i\) in \(R\). For purposes of nacf
, these neighborhoods are calculated using neighborhood
, with the specified parameters (including dichotomization at thresh
).
The return value for nacf
is the selected dependence statistic, calculated for each neighborhood structure from order 0 (the identity) through order lag.max
(or \(N-1\), if lag.max==NULL
). This vector can be used much like the conventional autocorrelation function, to identify dependencies at various lags. This may, in turn, suggest a starting point for modeling via routines such as lnam
.
Geary, R.C. (1954). “The Contiguity Ratio and Statistical Mapping.” The Incorporated Statistician, 5: 115-145.
Moran, P.A.P. (1950). “Notes on Continuous Stochastic Phenomena.” Biometrika, 37: 17-23.
geodist
, gapply
, neighborhood
, lnam
, acf
#Create a random graph, and an autocorrelated variable
g<-rgraph(50,tp=4/49)
y<-qr.solve(diag(50)-0.8*g,rnorm(50,0,0.05))
#Examine the network autocorrelation function
nacf(g,y) #Partial neighborhoods
nacf(g,y,partial.neighborhood=FALSE) #Cumulative neighborhoods
#Repeat, using Moran's I on the underlying graph
nacf(g,y,type="moran")
nacf(g,y,partial.neighborhood=FALSE,type="moran")
Run the code above in your browser using DataLab