nodal_attributes: Specifying nodal attributes and their levels

Description

This document describes the ways to specify nodal attributes or functions of nodal attributes and which levels for categorical factors to include. For the helper functions to facilitate this, see nodal_attributes-API.

Usage

LARGEST(l, a)
SMALLEST(l, a)
COLLAPSE_SMALLEST(object, n, into)

Arguments

object, l, a, n, into: COLLAPSE_SMALLEST, LARGEST, and SMALLEST are technically functions but they are generally not called in a standard fashion but rather as a part of an vertex attribute specification or a level specification as described below. The above usage examples are needed to pass R's package checking without warnings; please disregard them, and refer to the sections and examples below instead.

Specifying nodal attributes

Term nodal attribute arguments, typically called attr, attrs, by, or on are interpreted as follows:

a character string: Extract the vertex attribute with this name.
a character vector of length > 1: Extract the vertex attributes and paste them together, separated by dots if the term expects categorical attributes and (typically) combine into a covariate matrix if it expects quantitative attributes.
a function: The function is called on the LHS network and additional arguments to ergm_get_vattr(), expected to return a vector or matrix of appropriate dimension. (Shorter vectors and matrix columns will be recycled as needed.)
a formula: The expression on the RHS of the formula is evaluated in an environment of the vertex attributes of the network, expected to return a vector or matrix of appropriate dimension. (Shorter vectors and matrix columns will be recycled as needed.) Within this expression, the network itself accessible as either . or .nw. For example, nodecov(~abs(Grade-mean(Grade))/network.size(.)) would return the absolute difference of each actor's "Grade" attribute from its network-wide mean, divided by the network size.
an AsIs object created by I(): Use as is, checking only for correct length and type.

Any of these arguments may also be wrapped in or piped through COLLAPSE_SMALLEST(attr, n, into) or, attr %>% COLLAPSE_SMALLEST(n, into), a convenience function that will transform the attribute by collapsing the smallest n categories into one, naming it into. Note that into must be of the same type (numeric, character, etc.) as the vertex attribute in question. If there are ties for nth smallest category, they will be broken in lexicographic order, and a warning will be issued.

The name the nodal attribute receives in the statistic can be overridden by setting a an attr()-style attribute "name".

Specifying categorical attribute levels and their ordering

For categorical attributes, to select which levels are of interest and their ordering, use the argument levels. Selection of nodes (from the appropriate vector of nodal indices) is likewise handled as the selection of levels, using the argument nodes. These arguments are interpreted as follows:

an expression wrapped in I(): Use the given list of levels as is.
a numeric or logical vector: Used for indexing of a list of all possible levels (typically, unique values of the attribute) in default older (typically lexicographic), i.e., sort(unique(attr))[levels]. In particular, levels=TRUE will retain all levels. Negative values exclude. Another special value is LARGEST, which will refer to the most frequent category, so, say, to set such a category as the baseline, pass levels=-LARGEST. In addition, LARGEST(n) will refer to the n largest categories. SMALLEST works analogously. If there are ties in frequencies, they will be broken in lexicographic order, and a warning will be issued. To specify numeric or logical levels literally, wrap in I().
NULL: Retain all possible levels; usually equivalent to passing TRUE.
a character vector: Use as is.
a function: The function is called on the list of unique values of the attribute, the values of the attribute themselves, and the network itself, depending on its arity. Its return value is interpreted as above.
a formula: The expression on the RHS of the formula is evaluated in an environment in which the network itself is accessible as .nw, the list of unique values of the attribute as . or as .levels, and the attribute vector itself as .attr. Its return value is interpreted as above.
a matrix: For mixing effects (i.e., level2= arguments), a matrix can be used to select elements of the mixing matrix, either by specifying a logical (TRUE and FALSE) matrix of the same dimension as the mixing matrix to select the corresponding cells or a two-column numeric matrix indicating giving the coordinates of cells to be used.

Note that levels, nodes, and others often have a default that is sensible for the term in question.

Examples

Run this code

library(magrittr) # for %>%

data(faux.mesa.high)

# Activity by grade with a baseline grade excluded:
summary(faux.mesa.high~nodefactor(~Grade))
# Name overrides:
summary(faux.mesa.high~nodefactor("Form"~Grade)) # Only for terms that don't use the LHS.
summary(faux.mesa.high~nodefactor(~structure(Grade,name="Form")))
# Retain all levels:
summary(faux.mesa.high~nodefactor(~Grade, levels=TRUE)) # or levels=NULL
# Use the largest grade as baseline (also Grade 7):
summary(faux.mesa.high~nodefactor(~Grade, levels=-LARGEST))
# Activity by grade with no baseline smallest two grades (11 and
# 12) collapsed into a new category, labelled 0:
table(faux.mesa.high %v% "Grade")
summary(faux.mesa.high~nodefactor((~Grade) %>% COLLAPSE_SMALLEST(2, 0),
                                  levels=TRUE))

# Handling of tied frequencies
faux.mesa.high %v% "Plans" <-
    sample(rep(c("College", "Trade School", "Apprenticeship", "Undecided"), c(80,80,20,25)))
summary(faux.mesa.high ~ nodefactor("Plans", levels = -LARGEST))

# Mixing between lower and upper grades:
summary(faux.mesa.high~mm(~Grade>=10))
# Mixing between grades 7 and 8 only:
summary(faux.mesa.high~mm("Grade", levels=I(c(7,8))))
# or
summary(faux.mesa.high~mm("Grade", levels=1:2))
# or using levels2 (see ? mm) to filter the combinations of levels,
summary(faux.mesa.high~mm("Grade",
        levels2=~sapply(.levels,
                        function(l)
                          l[[1]]%in%c(7,8) && l[[2]]%in%c(7,8))))

# Here are some less complex ways to specify levels2. This is the
# full list of combinations of sexes in an undirected network:
summary(faux.mesa.high~mm("Sex", levels2=TRUE))
# Select only the second combination:
summary(faux.mesa.high~mm("Sex", levels2=2))
# Equivalently,
summary(faux.mesa.high~mm("Sex", levels2=-c(1,3)))
# or
summary(faux.mesa.high~mm("Sex", levels2=c(FALSE,TRUE,FALSE)))
# Select all *but* the second one:
summary(faux.mesa.high~mm("Sex", levels2=-2))
# Select via a mixing matrix: (Network is undirected and
# attributes are the same on both sides, so we can use either M or
# its transpose.)
(M <- matrix(c(FALSE,TRUE,FALSE,FALSE),2,2))
summary(faux.mesa.high~mm("Sex", levels2=M)+mm("Sex", levels2=t(M)))
# Select via an index of a cell:
idx <- cbind(1,2)
summary(faux.mesa.high~mm("Sex", levels2=idx))
# Or, select by specific attribute value combinations, though note
# the names 'row' and 'col' and the order for undirected networks:
summary(faux.mesa.high~mm("Sex",
                          levels2 = I(list(list(row="M",col="M"),
                                           list(row="M",col="F"),
                                           list(row="F",col="M")))))

# mm() term allows two-sided attribute formulas with different attributes:
summary(faux.mesa.high~mm(Grade~Race, levels2=TRUE))
# It is possible to have collapsing functions in the formula; note
# the parentheses around "~Race": this is because a formula
# operator (~) has lower precedence than pipe (|>):
summary(faux.mesa.high~mm(Grade~(~Race) %>% COLLAPSE_SMALLEST(3,"BWO"), levels2=TRUE))

# Some terms, such as nodecov(), accept matrices of nodal
# covariates. An certain R quirk means that columns whose
# expressions are not typical variable names have their names
# dropped and need to be adjusted. Consider, for example, the
# linear and quadratic effects of grade:
Grade <- faux.mesa.high %v% "Grade"
colnames(cbind(Grade, Grade^2)) # Second column name missing.
colnames(cbind(Grade, Grade2=Grade^2)) # Can be set manually,
colnames(cbind(Grade, `Grade^2`=Grade^2)) # even to non-variable-names.
colnames(cbind(Grade, Grade^2, deparse.level=2)) # Alternatively, deparse.level=2 forces naming.
rm(Grade)
# \dontshow{
options(warn=1) # Print warnings immediately.
# }
# Therefore, the nodal attribute names are set as follows:
summary(faux.mesa.high~nodecov(~cbind(Grade, Grade^2))) # column names dropped with a warning
summary(faux.mesa.high~nodecov(~cbind(Grade, Grade2=Grade^2))) # column names set manually
summary(faux.mesa.high~nodecov(~cbind(Grade, Grade^2, deparse.level=2))) # using deparse.level=2

# Activity by grade with a random covariate. Note that setting an attribute "name" gives it a name:
randomcov <- structure(I(rbinom(network.size(faux.mesa.high),1,0.5)), name="random")
summary(faux.mesa.high~nodefactor(I(randomcov)))

Run the code above in your browser using DataLab