cugtest: Perform Conditional Uniform Graph (CUG) Hypothesis Tests for Graph-Level Indices

Description

cugtest tests an arbitrary GLI (computed on dat by FUN) against a conditional uniform graph null hypothesis, via Monte Carlo simulation. Some variation in the nature of the conditioning is available; currently, conditioning only on size, conditioning jointly on size and estimated tie probability (via expected density), and conditioning jointly on size and (bootstrapped) edge value distributions are implemented. Note that fair amount of flexibility is possible regarding CUG tests on functions of GLIs (Anderson et al., 1999). See below for more details.

Usage

cugtest(dat, FUN, reps=1000, gmode="digraph", cmode="density", 
    diag=FALSE, g1=1, g2=2, ...)

Value

An object of class cugtest, containing

testval: The observed GLI value.
dist: A vector containing the Monte Carlo draws.
pgreq: The proportion of draws which were greater than or equal to the observed GLI value.
pleeq: The proportion of draws which were less than or equal to the observed GLI value.

Arguments

dat: graph(s) to be analyzed.
FUN: function to compute GLIs, or functions thereof. FUN must accept dat and the specified g arguments, and should return a real number.
reps: integer indicating the number of draws to use for quantile estimation. Note that, as for all Monte Carlo procedures, convergence is slower for more extreme quantiles. By default, reps==1000.
gmode: string indicating the type of graph being evaluated. "digraph" indicates that edges should be interpreted as directed; "graph" indicates that edges are undirected. gmode is set to "digraph" by default.
cmode: string indicating the type of conditioning assumed by the null hypothesis. If cmode is set to "density", then the density of the graph in question is used to determine the tie probabilities of the Bernoulli graph draws (which are also conditioned on |V(G)|). Ifcmode=="ties", then draws are bootstrapped from the distribution of edge values within the data matrices. If cmode="order", then draws are uniform over all graphs of the same order (size) as the graphs within the input stack. By default, cmode is set to "density".
diag: boolean indicating whether or not the diagonal should be treated as valid data. Set this true if and only if the data can contain loops. diag is FALSE by default.
g1: integer indicating the index of the first graph input to the GLI. By default, g1==1.
g2: integer indicating the index of the second graph input to the GLI. (FUN can ignore this, if one wishes to test the GLI value of a single graph, but it should recognize the argument.) By default, g2==2.
...: additional arguments to FUN.

Author

Carter T. Butts buttsc@uci.edu

Details

The null hypothesis of the CUG test is that the observed GLI (or function thereof) was drawn from a distribution equivalent to that of said GLI evaluated (uniformly) on the space of all graphs conditional on one or more features. The most common “features” used for conditioning purposes are order (size) and density, both of which are known to have strong and nontrivial effects on other GLIs (Anderson et al., 1999) and which are, in many cases, exogenously determined. (Note that maximum entropy distributions conditional on expected statistics are not in general correctly referred to as “conditional uniform graphs”, but have been described as such for independent-dyad models; this is indeed the case for this function, although such terminology is not really proper. See cug.test for CUG tests with exact conditioning.) Since theoretical results regarding functions of arbitrary GLIs on the space of graphs are not available, the standard approach to CUG testing is to approximate the quantiles of the observed statistic associated with the null hypothesis using Monte Carlo methods. This is the technique utilized by cugtest, which takes appropriately conditioned draws from the set of graphs and computes on them the GLI specified in FUN, thereby accumulating an approximation to the true quantiles.

The cugtest procedure returns a cugtest object containing the estimated distribution of the test GLI under the null hypothesis, the observed GLI value of the data, and the one-tailed p-values (estimated quantiles) associated with said observation. As usual, the (upper tail) null hypothesis is rejected for significance level alpha if p>=observation is less than alpha (or p<=observation, for the lower tail). Standard caveats regarding the use of null hypothesis testing procedures are relevant here: in particular, bear in mind that a significant result does not necessarily imply that the likelihood ratio of the null model and the alternative hypothesis favors the latter.

Informative and aesthetically pleasing portrayals of cugtest objects are available via the print.cugtest and summary.cugtest methods. The plot.cugtest method displays the estimated distribution, with a reference line signifying the observed value.

References

Anderson, B.S.; Butts, C.T.; and Carley, K.M. (1999). “The Interaction of Size and Density with Graph-Level Indices.” Social Networks, 21(3), 239-267.

Examples

Run this code

#Draw two random graphs, with different tie probabilities
dat<-rgraph(20,2,tprob=c(0.2,0.8))
#Is their correlation higher than would be expected, conditioning 
#only on size?
cug<-cugtest(dat,gcor,cmode="order")
summary(cug)
#Now, let's try conditioning on density as well.
cug<-cugtest(dat,gcor)
summary(cug)

Run the code above in your browser using DataLab