pstar: Fit a p*/ERG Model Using a Logistic Approximation

Description

Fits a p*/ERG model to the graph in dat containing the effects listed in effects. The result is returned as a glm object.

Usage

pstar(dat, effects=c("choice", "mutuality", "density", "reciprocity",
    "stransitivity", "wtransitivity", "stranstri",  "wtranstri", 
    "outdegree", "indegree", "betweenness", "closeness", 
    "degcentralization", "betcentralization", "clocentralization",
    "connectedness", "hierarchy", "lubness", "efficiency"), 
    attr=NULL, memb=NULL, diag=FALSE, mode="digraph")

Value

A glm object

Arguments

dat: a single graph
effects: a vector of strings indicating which effects should be fit.
attr: a matrix whose columns contain individual attributes (one row per vertex) whose differences should be used as supplemental predictors.
memb: a matrix whose columns contain group memberships whose categorical similarities (same group/not same group) should be used as supplemental predictors.
diag: a boolean indicating whether or not diagonal entries (loops) should be counted as meaningful data.
mode: "digraph" if dat is directed, else "graph"

Author

Carter T. Butts buttsc@uci.edu

WARNING

Estimation of p* models by maximum pseudo-likelihood is now known to be a dangerous practice. Use at your own risk.

Details

The Exponential Family-Random Graph Model (ERGM) family, referred to as “p*” in older literature, is an exponential family specification for network data. In this specification, it is assumed that $$p(G=g) \propto \exp(\beta_0 \gamma_0(g) + \beta_1 \gamma_1(g) + \dots)$$ for all g, where the betas represent real coefficients and the gammas represent functions of g. Unfortunately, the unknown normalizing factor in the above expression makes evaluation difficult in the general case. One solution to this problem is to operate instead on the edgewise log odds; in this case, the ERGM/p* MLE can be approximated by a logistic regression of each edge on the differences in the gamma scores induced by the presence and absence of said edge in the graph (conditional on all other edges). It is this approximation (known as autologistic regression, or maximum pseudo-likelihood estimation) that is employed here.

Note that ERGM modeling is considerably more advanced than it was when this function was created, and estimation by MPLE is now used only in special cases. Guidelines for model specification and assessment have also evolved. The ergm package within the statnet library reflects the current state of the art, and use of the ergm() function in said library is highly recommended. This function is retained primarily as a legacy tool, for users who are nostalgic for 2000-vintage ERGM (“p*”) modeling experience. Caveat emptor.

Using the effects argument, a range of different potential parameters can be estimated. The network measure associated with each is, in turn, the edge-perturbed difference in:

choice: the number of edges in the graph (acts as a constant)
mutuality: the number of reciprocated dyads in the graph
density: the density of the graph
reciprocity: the edgewise reciprocity of the graph
stransitivity: the strong transitivity of the graph
wtransitivity: the weak transitivity of the graph
stranstri: the number of strongly transitive triads in the graph
wtranstri: the number of weakly transitive triads in the graph
outdegree: the outdegree of each actor (|V| parameters)
indegree: the indegree of each actor (|V| parameters)
betweenness: the betweenness of each actor (|V| parameters)
closeness: the closeness of each actor (|V| parameters)
degcentralization: the Freeman degree centralization of the graph
betcentralization: the betweenness centralization of the graph
clocentralization: the closeness centralization of the graph
connectedness: the Krackhardt connectedness of the graph
hierarchy: the Krackhardt hierarchy of the graph
efficiency: the Krackhardt efficiency of the graph
lubness: the Krackhardt LUBness of the graph

(Note that some of these do differ somewhat from the common specifications employed in the older p* literature, e.g. quantities such as density and reciprocity are computed as per the gden and grecip functions rather than via the unnormalized "choice" and "mutual" quantities that were generally used.) Please do not attempt to use all effects simultaneously!!! In addition to the above, the user may specify a matrix of individual attributes whose absolute dyadic differences are to be used as predictors, as well as a matrix of individual memberships whose dyadic categorical similarities (same/different) are used in the same manner.

Although the ERGM framework is quite versatile in its ability to accommodate a range of structural predictors, it should be noted that the substantial collinearity of many of the terms provided here can lead to very unstable model fits. Measurement and specification errors compound this problem, as does the use of the MPLE; thus, it is somewhat risky to use pstar in an exploratory capacity (i.e., when there is little prior knowledge to constrain choice of parameters). While raw instability due to multicollinearity should decline with graph size, improper specification will still result in biased coefficient estimates so long as an omitted predictor correlates with an included predictor. Moreover, many models created using these effects are at risk of degeneracy, which is difficult to assess without simulation-based model assessment. Caution is advised - or, better, use of the ergm package.

References

Anderson, C.; Wasserman, S.; and Crouch, B. (1999). ``A p* Primer: Logit Models for Social Networks. Social Networks, 21,37-66.

Holland, P.W., and Leinhardt, S. (1981). ``An Exponential Family of Probability Distributions for Directed Graphs.'' Journal of the American statistical Association, 81, 51-67.

Wasserman, S., and Pattison, P. (1996). ``Logit Models and Logistic Regressions for Social Networks: I. An introduction to Markov Graphs and p*.'' Psychometrika, 60, 401-426.

Examples

Run this code

if (FALSE) {
#Create a graph with expansiveness and popularity effects
in.str<-rnorm(20,0,3)
out.str<-rnorm(20,0,3)
tie.str<-outer(out.str,in.str,"+")
tie.p<-apply(tie.str,c(1,2),function(a){1/(1+exp(-a))})
g<-rgraph(20,tprob=tie.p)

#Fit a model with expansiveness only
p1<-pstar(g,effects="outdegree")
#Fit a model with expansiveness and popularity
p2<-pstar(g,effects=c("outdegree","indegree"))
#Fit a model with expansiveness, popularity, and mutuality
p3<-pstar(g,effects=c("outdegree","indegree","mutuality"))

#Compare the model AICs -- use ONLY as heuristics!!!
extractAIC(p1)
extractAIC(p2)
extractAIC(p3)
}

Run the code above in your browser using DataLab