This page explains how to specify the network statistics \(g(y)\) to functions in the ergm
package and packages that extend it. It also provides an indexed list of the possible terms (and hence network statistics) visible to the ergm API. Terms can also be searched via search.ergmTerms
, and help for an individual term can be obtained with ergmTerm?<term>
or help("<term>-ergmTerm")
.
In an exponential-family random graph model (ERGM), the probability or density of a given network, \(y \in Y\), on a set of nodes is $$h(y) \exp[\eta(\theta) \cdot g(y)] / \kappa(\theta),$$ where \(h(y)\) is the reference distribution (particularly for valued network models), \(g(y)\) is a vector of network statistics for \(y\), \(\eta(\theta)\) is a natural parameter vector of the same length (with \(\eta(\theta)\equiv\theta\) for most terms), \(\cdot\) is the dot product, and \(\kappa(\theta)\) is the normalizing constant for the distribution. A complete ERGM specification requires a list of network statistics \(g(y)\) and (if applicable) their \(\eta(\theta)\) mappings provided by a formula of ergmTerm
s; and, optionally, sample space \(\mathcal{Y}\) and reference distribution \(h(y)\) information provided by ergmConstraint
s and, for valued ERGMs, by ergmReference
s.
Network statistics \(g(y)\) and mappings \(\eta(\theta)\) are specified by a formula object, of the form y ~ <term 1> + <term 2> ...
, where
y
is a network object or a matrix that can be coerced to a network
object, and <term 1>
, <term 2>
, etc, are each terms chosen
from the list given below. To create a network object in , use the
network
function, then add nodal attributes to it
using the %v%
operator if necessary.
Operator terms like B()
and F()
take
formulas with other ergm
terms as their arguments and transform them
by modifying their inputs (e.g., the network they evaluate) and/or their
outputs.
By convention, their names are capitalized and CamelCased.
For binary ERGMs, interactions between ergm
terms can be
specified in a manner similar to lm
and others, as using the
:
and *
operators. However, they must be interpreted
carefully, especially for dyad-dependent terms. (Interactions involving
curved terms are not supported at this time.)
Generally, if term a
has \(p_a\) statistics and b
has
\(p_b\), a:b
will add \(p_a \times p_b\)
statistics to the model, corresponding to each element of
\(g_a(y)\) interacted with each element of \(g_b(y)\).
The interaction is defined as follows. Dyad-independent terms can be expressed in the general form \(g(y;x)=\sum_{i,j} \)\( x_{i,j}y_{i,j}\) for some edge covariate matrix \(x\), $$g_{a:b}(y)=\sum_{i,j} x_{a,i,j}x_{b,i,j}y_{i,j}.$$ In other words, rather than being a product of their sufficient statistics (\(g_{a}(y)g_{b}(y)\)), it is a dyadwise product of their dyad-level effects.
This means that an interaction between two dyad-independent terms can be
interpreted the same way as it would be in the corresponding logistic
regression for each potential edge. However, for undirected networks in
particular, this may lead to somewhat counterintuitive results. For example,
given two nodal covariates "a"
and "b"
(whose values for node
\(i\) are denoted \(a_i\) and \(b_i\), respectively),
nodecov("a")
adds one statistic of the form \(\sum_{i,j}
(a_{i}+a_{j}) y_{i,j}\) and analogously for
nodecov("b")
, so nodecov("a"):nodecov("b")
produces
$$\sum_{i,j} (a_{i}+a_{j}) (b_{i}+b_{j}) y_{i,j}.$$
ergm
functions such as ergm
and
simulate
(for ERGMs) may operate in two
modes: binary and weighted/valued, with the latter activated by passing a
non-NULL value as the response
argument, giving the edge attribute
name to be modeled/simulated.
Binary ERGM statistics cannot be
used directly in valued mode and vice versa. However, a substantial number
of binary ERGM statistics --- particularly the ones with dyadic independence
--- have simple generalizations to valued ERGMs, and have been adapted in
ergm
. They have the same form as their binary
ERGM counterparts, with an additional argument: form
, which, at this
time, has two possible values: "sum"
(the default) and
"nonzero"
. The former creates a statistic of the form \(\sum_{i,j}
x_{i,j} y_{i,j}\), where \(y_{i,j}\) is the
value of dyad \((i,j)\) and \(x_{i,j}\) is the term's covariate
associated with it. The latter computes the binary version, with the edge
considered to be present if its value is not 0. Valued version of some
binary ERGM terms have an argument threshold
, which sets the value
above which a dyad is conidered to have a tie. (Value less than or equal to
threshold
is considered a nontie.)
The B()
operator term documented below can be used to pass other
binary terms to valued models, and is more flexible, at the cost of being
somewhat slower.
Terms taking a categorical nodal covariate also take the levels
argument. (There are analogous b1levels
and b2levels
arguments for some terms that apply to bipartite networks, and the
levels2
argument for mixing terms.) The levels
argument can
be used to control the set and the ordering of attribute levels.
Terms that allow the selection of nodes do so with the nodes
argument, which is interpreted in the same way as the levels
argument, where the categories are the relevant nodal indices themselves.
Both levels
and nodes
use the new level selection UI. (See
Specifying Vertex attributes and Levels (?
nodal_attributes
) for details.)
The legacy base
and keep
arguments are deprecated as of
version 3.10, and replaced by the levels
UI. The levels
argument provides consistent and flexible mechanisms for specifying which
attribute levels to exclude (previously handled by base
) and include
(previously handled by keep
). If levels
or nodes
argument is given, then base
and keep
arguments are ignored.
The legacy arguments will most likely be removed in a future version.
Note that this exact behavior is new in version 3.10, and it differs
slightly from older versions: previously if both levels
and
base
/keep
were given, levels
argument was applied first
and then applied the base
/keep
argument. Since version 3.10,
base
/keep
would be ignored, even if old term behavior is
invoked (as described in the next section).
When a term's behavior has changed from prior version, it is often possible
to invoke the old behavior by setting and/or passing a version
term
option, giving the verison (constructed by as.package_version
)
desired.
ergm
terms
Users and other packages may build custom terms, and package ergm.userterms (https://github.com/statnet/ergm.userterms) provides tools for implementing them.
The current recommendation for any package implementing additional terms is
to document the term with Roxygen comments and a name in the form
termName-ergmTerm
. This ensures that help("ergmTerm")
will list ERGM
terms available from all loaded packages.
As noted above, a cross-referenced HTML version of the term documentation is
also available via vignette('ergm-term-crossRef')
and terms
can also be searched via search.ergmTerms
.
ergm:::.formatIndexHtml(ergm:::.buildTermsDataframe("ergmTerm", keywords = ~!"operator"%in%.))
ergm:::.formatIndexHtml(ergm:::.buildTermsDataframe("ergmTerm", keywords = ~"operator"%in%.))
ergm:::.formatMatrixHtml(ergm:::.termMatrix("ergmTerm", keywords=~"frequently-used"%in%., display.keywords = subset(ergm::ergm_keyword(), popular)$name))
ergm:::.formatMatrixHtml(ergm:::.termMatrix("ergmTerm", keywords=~"operator"%in%., display.keywords = subset(ergm::ergm_keyword(), popular & name!="operator")$name))
ergm:::.formatMatrixHtml(ergm:::.termMatrix("ergmTerm"))
ergm:::.formatTocHtml(ergm:::.termToc("ergmTerm"))
Krivitsky P. N., Hunter D. R., Morris M., Klumb C. (2021). "ergm 4.0: New features and improvements." arXiv:2106.04997. https://arxiv.org/abs/2106.04997
Bomiriya, R. P, Bansal, S., and Hunter, D. R. (2014). Modeling Homophily in ERGMs for Bipartite Networks. Submitted.
Butts, CT. (2008). "A Relational Event Framework for Social Action." Sociological Methodology, 38(1).
Davis, J.A. and Leinhardt, S. (1972). The Structure of Positive Interpersonal Relations in Small Groups. In J. Berger (Ed.), Sociological Theories in Progress, Volume 2, 218--251. Boston: Houghton Mifflin.
Holland, P. W. and S. Leinhardt (1981). An exponential family of probability distributions for directed graphs. Journal of the American Statistical Association, 76: 33--50.
Hunter, D. R. and M. S. Handcock (2006). Inference in curved exponential family models for networks. Journal of Computational and Graphical Statistics, 15: 565--583.
Hunter, D. R. (2007). Curved exponential family models for social networks. Social Networks, 29: 216--230.
Krackhardt, D. and Handcock, M. S. (2007). Heider versus Simmel: Emergent Features in Dynamic Structures. Lecture Notes in Computer Science, 4503, 14--27.
Krivitsky P. N. (2012). Exponential-Family Random Graph Models for Valued Networks. Electronic Journal of Statistics, 2012, 6, 1100-1128. tools:::Rd_expr_doi("10.1214/12-EJS696")
Robins, G; Pattison, P; and Wang, P. (2009). "Closure, Connectivity, and Degree Distributions: Exponential Random Graph (p*) Models for Directed Social Networks." Social Networks, 31:105-117.
Snijders T. A. B., G. G. van de Bunt, and C. E. G. Steglich. Introduction to Stochastic Actor-Based Models for Network Dynamics. Social Networks, 2010, 32(1), 44-60. tools:::Rd_expr_doi("10.1016/j.socnet.2009.02.004")
Morris M, Handcock MS, and Hunter DR. Specification of Exponential-Family Random Graph Models: Terms and Computational Aspects. Journal of Statistical Software, 2008, 24(4), 1-24. tools:::Rd_expr_doi("10.18637/jss.v024.i04")
Snijders, T. A. B., P. E. Pattison, G. L. Robins, and M. S. Handcock (2006). New specifications for exponential random graph models, Sociological Methodology, 36(1): 99-153.
ergm
package, search.ergmTerms
, ergm
, network
, %v%
, %n%
if (FALSE) {
ergm(flomarriage ~ kstar(1:2) + absdiff("wealth") + triangle)
ergm(molecule ~ edges + kstar(2:3) + triangle
+ nodematch("atomic type",diff=TRUE)
+ triangle + absdiff("atomic type"))
}
Run the code above in your browser using DataLab