ergmTerm: Terms used in Exponential Family Random Graph Models

Description

This page explains how to specify the network statistics $g(y)$ to functions in the ergm package and packages that extend it. It also provides an indexed list of the possible terms (and hence network statistics) visible to the ergm API. Terms can also be searched via search.ergmTerms, and help for an individual term can be obtained with ergmTerm?<term> or help("<term>-ergmTerm").

Arguments

Specifying models

In an exponential-family random graph model (ERGM), the probability or density of a given network, $y \in Y$, on a set of nodes is $$h(y) \exp[\eta(\theta) \cdot g(y)] / \kappa(\theta),$$ where $h(y)$ is the reference distribution (particularly for valued network models), $g(y)$ is a vector of network statistics for $y$, $\eta(\theta)$ is a natural parameter vector of the same length (with $\eta(\theta)\equiv\theta$ for most terms), $\cdot$ is the dot product, and $\kappa(\theta)$ is the normalizing constant for the distribution. A complete ERGM specification requires a list of network statistics $g(y)$ and (if applicable) their $\eta(\theta)$ mappings provided by a formula of ergmTerms; and, optionally, sample space $\mathcal{Y}$ and reference distribution $h(y)$ information provided by ergmConstraints and, for valued ERGMs, by ergmReferences.

Network statistics $g(y)$ and mappings $\eta(\theta)$ are specified by a formula object, of the form y ~ <term 1> + <term 2> ..., where y is a network object or a matrix that can be coerced to a network object, and <term 1>, <term 2>, etc, are each terms chosen from the list given below. To create a network object in , use the network function, then add nodal attributes to it using the %v% operator if necessary.

Term operators

Operator terms like B and F take formulas with other ergm terms as their arguments and transform them by modifying their inputs (e.g., the network they evaluate) and/or their outputs.

By convention, their names are capitalized and CamelCased.

Interactions

For binary ERGMs, interactions between ergm terms can be specified in a manner similar to lm and others, as using the : and * operators. However, they must be interpreted carefully, especially for dyad-dependent terms. (Interactions involving curved terms are not supported at this time.)

Generally, if term a has $p_a$ statistics and b has $p_b$, a:b will add $p_a \times p_b$ statistics to the model, corresponding to each element of $g_a(y)$ interacted with each element of $g_b(y)$.

The interaction is defined as follows. Dyad-independent terms can be expressed in the general form $g(y;x)=\sum_{i,j} $$ x_{i,j}y_{i,j}$ for some edge covariate matrix $x$, $$g_{a:b}(y)=\sum_{i,j} x_{a,i,j}x_{b,i,j}y_{i,j}.$$ In other words, rather than being a product of their sufficient statistics ($g_{a}(y)g_{b}(y)$), it is a dyadwise product of their dyad-level effects.

This means that an interaction between two dyad-independent terms can be interpreted the same way as it would be in the corresponding logistic regression for each potential edge. However, for undirected networks in particular, this may lead to somewhat counterintuitive results. For example, given two nodal covariates "a" and "b" (whose values for node $i$ are denoted $a_i$ and $b_i$, respectively), nodecov("a") adds one statistic of the form $\sum_{i,j} (a_{i}+a_{j}) y_{i,j}$ and analogously for nodecov("b"), so nodecov("a"):nodecov("b") produces $$\sum_{i,j} (a_{i}+a_{j}) (b_{i}+b_{j}) y_{i,j}.$$

Binary and valued ERGM terms

ergm functions such as ergm and simulate (for ERGMs) may operate in two modes: binary and weighted/valued, with the latter activated by passing a non-NULL value as the response argument, giving the edge attribute name to be modeled/simulated.

Generalizations of binary terms

Binary ERGM statistics cannot be used directly in valued mode and vice versa. However, a substantial number of binary ERGM statistics --- particularly the ones with dyadic independence --- have simple generalizations to valued ERGMs, and have been adapted in ergm. They have the same form as their binary ERGM counterparts, with an additional argument: form, which, at this time, has two possible values: "sum" (the default) and "nonzero". The former creates a statistic of the form $\sum_{i,j} x_{i,j} y_{i,j}$, where $y_{i,j}$ is the value of dyad $(i,j)$ and $x_{i,j}$ is the term's covariate associated with it. The latter computes the binary version, with the edge considered to be present if its value is not 0. Valued version of some binary ERGM terms have an argument threshold, which sets the value above which a dyad is conidered to have a tie. (Value less than or equal to threshold is considered a nontie.)

The B() operator term documented below can be used to pass other binary terms to valued models, and is more flexible, at the cost of being somewhat slower.

Nodal attribute levels and indices

Terms taking a categorical nodal covariate also take the levels argument. (There are analogous b1levels and b2levels arguments for some terms that apply to bipartite networks, and the levels2 argument for mixing terms.) The levels argument can be used to control the set and the ordering of attribute levels.

Terms that allow the selection of nodes do so with the nodes argument, which is interpreted in the same way as the levels argument, where the categories are the relevant nodal indices themselves.

Both levels and nodes use the new level selection UI. (See Specifying Vertex attributes and Levels (? nodal_attributes) for details.)

Legacy arguments

The legacy base and keep arguments are deprecated as of version 3.10, and replaced by the levels UI. The levels argument provides consistent and flexible mechanisms for specifying which attribute levels to exclude (previously handled by base) and include (previously handled by keep). If levels or nodes argument is given, then base and keep arguments are ignored. The legacy arguments will most likely be removed in a future version.

Note that this exact behavior is new in version 3.10, and it differs slightly from older versions: previously if both levels and base/keep were given, levels argument was applied first and then applied the base/keep argument. Since version 3.10, base/keep would be ignored, even if old term behavior is invoked (as described in the next section).

Term versioning

When a term's behavior has changed from prior version, it is often possible to invoke the old behavior by setting and/or passing a version term option, giving the verison (constructed by as.package_version) desired.

Custom `ergm` terms

Users and other packages may build custom terms, and package ergm.userterms (https://github.com/statnet/ergm.userterms) provides tools for implementing them.

The current recommendation for any package implementing additional terms is to document the term with Roxygen comments and a name in the form termName-ergmTerm. This ensures that help("ergmTerm") will list ERGM terms available from all loaded packages.

Terms included in the <code>ergm</code> package

As noted above, a cross-referenced HTML version of the term documentation is also available via vignette('ergm-term-crossRef') and terms can also be searched via search.ergmTerms.

Term index (plain)

ergm:::.formatIndexHtml(ergm:::.buildTermsDataframe("ergmTerm", keywords = ~!"operator"%in%.))

Term index (operator)

ergm:::.formatIndexHtml(ergm:::.buildTermsDataframe("ergmTerm", keywords = ~"operator"%in%.))

Frequently-used terms

ergm:::.formatMatrixHtml(ergm:::.termMatrix("ergmTerm", keywords=~"frequently-used"%in%., display.keywords = subset(ergm::ergm_keyword(), popular)$name))

Operator terms

ergm:::.formatMatrixHtml(ergm:::.termMatrix("ergmTerm", keywords=~"operator"%in%., display.keywords = subset(ergm::ergm_keyword(), popular & name!="operator")$name))

All terms

ergm:::.formatMatrixHtml(ergm:::.termMatrix("ergmTerm"))

Terms by keywords

ergm:::.formatTocHtml(ergm:::.termToc("ergmTerm"))

References

Krivitsky P. N., Hunter D. R., Morris M., Klumb C. (2021). "ergm 4.0: New features and improvements." arXiv:2106.04997. https://arxiv.org/abs/2106.04997
Bomiriya, R. P, Bansal, S., and Hunter, D. R. (2014). Modeling Homophily in ERGMs for Bipartite Networks. Submitted.
Butts, CT. (2008). "A Relational Event Framework for Social Action." Sociological Methodology, 38(1).
Davis, J.A. and Leinhardt, S. (1972). The Structure of Positive Interpersonal Relations in Small Groups. In J. Berger (Ed.), Sociological Theories in Progress, Volume 2, 218--251. Boston: Houghton Mifflin.
Holland, P. W. and S. Leinhardt (1981). An exponential family of probability distributions for directed graphs. Journal of the American Statistical Association, 76: 33--50.
Hunter, D. R. and M. S. Handcock (2006). Inference in curved exponential family models for networks. Journal of Computational and Graphical Statistics, 15: 565--583.
Hunter, D. R. (2007). Curved exponential family models for social networks. Social Networks, 29: 216--230.
Krackhardt, D. and Handcock, M. S. (2007). Heider versus Simmel: Emergent Features in Dynamic Structures. Lecture Notes in Computer Science, 4503, 14--27.
Krivitsky P. N. (2012). Exponential-Family Random Graph Models for Valued Networks. Electronic Journal of Statistics, 2012, 6, 1100-1128. tools:::Rd_expr_doi("10.1214/12-EJS696")
Robins, G; Pattison, P; and Wang, P. (2009). "Closure, Connectivity, and Degree Distributions: Exponential Random Graph (p*) Models for Directed Social Networks." Social Networks, 31:105-117.
Snijders T. A. B., G. G. van de Bunt, and C. E. G. Steglich. Introduction to Stochastic Actor-Based Models for Network Dynamics. Social Networks, 2010, 32(1), 44-60. tools:::Rd_expr_doi("10.1016/j.socnet.2009.02.004")
Morris M, Handcock MS, and Hunter DR. Specification of Exponential-Family Random Graph Models: Terms and Computational Aspects. Journal of Statistical Software, 2008, 24(4), 1-24. tools:::Rd_expr_doi("10.18637/jss.v024.i04")
Snijders, T. A. B., P. E. Pattison, G. L. Robins, and M. S. Handcock (2006). New specifications for exponential random graph models, Sociological Methodology, 36(1): 99-153.

Examples

Run this code

if (FALSE) {
ergm(flomarriage ~ kstar(1:2) + absdiff("wealth") + triangle)

ergm(molecule ~ edges + kstar(2:3) + triangle
                      + nodematch("atomic type",diff=TRUE)
                      + triangle + absdiff("atomic type"))
}

Run the code above in your browser using DataLab