ergm: Fit a Latent Space Random Graph Model

Description

ergm is used to fit latent space and latent space cluster random network models, as described in Hoff, Raftery and Handcock (2002) and Handcock, Raftery and tantrum (2005). ergm can return either a Bayesian model fit or an approximate MLE based on a Monte Carlo scheme.

Usage

ergm(formula, theta0=NULL, 
     burnin=1000, MCMCsamplesize=1000, interval=100, maxit=5,
     latent.control=list(maxit=40,penalty.sigma=c(10,0.5),MLEonly=FALSE),
     returnMCMCstats=TRUE, randseed=NULL, 
     verbose=FALSE, ...)

Arguments

formula

An Rformula object, of the form g ~ + ..., where g is a network object or a matrix that can be coerced to a network object, and , , etc, are each terms chosen

theta0

The parameter value used to generate the MCMC sample. By default the MPLE is used (startatMPLE=TRUE).

burnin

The number of proposals before any MCMC sampling is done. Currently, there is no support for any check of the Markov chain mixing, so burnin should be set to a fairly large number.

MCMCsamplesize

Size of the sample of network statistics, randomly drawn from a given distribution on the set of all networks, returned by the Metropolis-Hastings algorithm.

interval

The number of proposals between sampled statistics. The program prints a warning if too few proposals are being accepted in any interval before each sample.

maxit

The number of times the parameter for the MCMC should be updated by maximizing the MCMC likelihood. At each step the parameter is changed to the values that maximizes the MCMC likelihood based on the current sample. For each step both the MCMCsamp

latent.control

Control variables for the latent space algorithm. This are used only if a latent term is included in the model. maxit sets the maximum number of iterations to use in the Quasi-Newton-Rap

returnMCMCstats

If this is TRUE the matrix of change statistics from the MCMC run is returned as component $sample. This matrix is actually an object of class mcmc and can be used directly in the CODA package to

randseed

Random number integer seed. The default is sample(10000000, size=1).

verbose

If this is TRUE, we will print out more information as we run the program, including (currently) some goodness of fit statistics.

...

Additional arguments, to be passed to lower-level functions in the future.

Value

ergm returns an object of class ergm that is a list consisting of the following elements:
$coefThe Monte Carlo maximum likelihood estimate of $\theta$, the vector of coefficients for the model parameters.
$sampleThe $n\times p$ matrix of network statistics, where $n$ is the sample size and $p$ is the number of network statistics specified in the model, that is used in the maximum likelihood estimation routine.
$iterationsThe number of Newton-Raphson iterations required before convergence.
$MCMCthetaThe value of $\theta$ used to produce the Markov chain Monte Carlo sample. As long as the Markov chain mixes sufficiently well, $sample is roughly a random sample from the distribution of network statistics specified by the model with the parameter equal to $MCMCtheta. In the current version, if startatMPLE is TRUE, then $MCMCtheta equals the MPLE.
$loglikelihoodThe approximate log-likelihood for the MLE. The value is only approximate because it is based on the MCMC random sample.
$gradientThe value of the gradient vector of the approximated loglikelihood function, evaluated at the maximizer. This vector should be very close to zero.
$hessianThe Hessian matrix of the approximated loglikelihood function, evaluated at the maximizer. This matrix may be inverted to give an approximate covariance matrix for the MLE.
$samplesizeThe size of the MCMC sample
$formulaThe original formula we entered into the ergm function.
$statsmatrixIf the option $returnMCMCstats=TRUE, this is the the matrix of change statistics from the MCMC run.
$newnetworkThe network generated at the end of the MCMC sampling.
See the function print.ergm for details on how an ergm object is printed. Note that we have written a function, summary.ergm that returns a summary of the relevant parts of the ergm object in concise summary format.

Model Terms

The latentnet package itself allows only three type of terms: latent, latentcluster and latentcov. The ergm package allows the user to explore a large number of potential models for their network data in addition to these terms. The terms currently supported by the program, and a brief description of each is given in the documentation terms.ergm for the ergm package. In the formula for the model, the model terms are various function-like calls, some of which require arguments, separated by + signs. The current options are: latent(k=2, ...){Latent position model term. where k is the dimension of the latent space. For information on the other arguments look for help on latent. } latentcluster(k=2, ngroups, ...){Latent position cluster model term. where k is the dimension of the latent space and ngroups is the number of clusters in the latent space. For information on the other arguments look for help on latentcluster. } latentcov(cv, attrname=NULL){Covariates for the latent model. cv is either a matrix of covariates on each pair of vertices, or a network; if the latter, optional argument attrname provides the name of the edge attribute to use for edge values. This option adds one statistic to the model, representing the effect of the given covariate on the appearance of edges. edgecov can be called more than once, to model the effects of multiple covariates. }

References

Peter D. Hoff, Adrian E. Raftery and Mark S. Handcock. Latent space approaches to social network analysis. Journal of the American Statistical Association, Dec 2002, Vol.97, Iss. 460; pg. 1090-1098.

Mark S. Handcock, Adrian E. Raftery and Jeremy Tantrum. Model-Based Clustering for Social Networks. Working Paper Number 46, Center for Statistics and the Social Sciences, University of Washington, April 2005.

Examples

Run this code

#
# See http://www.csde.washington.edu/statnet
# for examples
#
# load the Florentine marriage data matrix
#
data(flo)
#
# attach the sociomatrix for the Florentine marriage data
# This is not yet a network object.
#
flo
#
# Create a network object out of the adjacency matrix
#
flomarriage <- network(flo,directed=FALSE)
flomarriage
#
# print out the sociomatrix for the Florentine marriage data
#
sociomatrix(flomarriage)
#
# create a vector indicating the wealth of each family (in thousands of lira) 
# and add it as a covariate to the network object
#
flomarriage <- set.vertex.attribute(flomarriage,"wealth",
  c(10,36,27,146,55,44,20,8,42,103,48,49,10,48,32,3))
flomarriage
#
# create a plot of the social network
#
plot(flomarriage)
#
# now make the vertex size proportional to their wealth
#
plot(flomarriage, vertex.cex="wealth", main="Marriage Ties")
#
# Use 'data(package = "latentnet")' to list the data sets in a
#
data(package="latentnet")
#
# Using Sampson's Monk data, lets fit a 
# simple latent position model
#
data(sampson)
#
# Get the group labels
#
group <- get.vertex.attribute(samplike,"group")
samp.labs <- substr(group,1,1)
#
samp.fit <- ergm(samplike ~ latent(k=2), burnin=10000,
                 MCMCsamplesize=2000, interval=30)
#
# See if we have convergence in the MCMC
mcmc.diagnostics(samp.fit)
#
# Plot the fit
#
plot(samp.fit,label=samp.labs, vertex.col="group")
#
# Using Sampson's Monk data, lets fit a latent clustering model
#
samp.fit <- ergm(samplike ~ latentcluster(k=2, ngroups=3), burnin=10000,
                 MCMCsamplesize=2000, interval=30)
#
# See if we have convergence in the MCMC
mcmc.diagnostics(samp.fit)
#
# Lets look at the goodness of fit:
#
plot(samp.fit,label=samp.labs, vertex.col="group")
plot(samp.fit,pie=TRUE,label=samp.labs)
plot(samp.fit,density=c(2,2))
plot(samp.fit,contours=5,contour.color="red")
plot(samp.fit,density=TRUE,drawarrows=TRUE)
#
# Add contours
#
add.contours(samp.fit,nlevels=8,lwd=2)
points(samp.fit$Z.mkl,pch=19,col=samp.fit$class)
#
# Try a covariate on the group
#
samegroup <- outer(group, group, "==")
diag(samegroup) <- 0
samp.fit <- ergm(samplike ~ latentcov(samegroup) + latent(k=2))
summary(samp.fit)

Run the code above in your browser using DataLab