ergmm: Fit a Latent Space Random Graph Model

Description

ergmm is used to fit latent space and latent space cluster random network models, as described in Hoff, Raftery and Handcock (2002) and Handcock, Raftery and Tantrum (2005). ergmm produces likelihood-based inference. Approximate maximum likelihood estimators are computed, and Bayesian inference is implemented via a MCMC algorithm.

Usage

ergmm(formula, theta0=NULL, 
     burnin=1000, MCMCsamplesize=1000, interval=10,
     latent.control=list(maxit=40,penalty.sigma=c(10,0.5),MLEonly=FALSE),
     returnMCMCstats=TRUE, randseed=NULL, 
     verbose=FALSE, ...)

Arguments

formula

An Rformula object, of the form y ~ + ..., where y is a network object or a matrix that can be coerced to a network object, and , , etc, are each terms chosen

theta0

The initial parameter value used to find the MLE. The default is based on multidimensional scaling fit to the positions.

burnin

The number of proposals before any MCMC sampling is done.

MCMCsamplesize

The number of posterior samples to draw.

interval

The number of proposal steps between sampled statistics.

latent.control

Control variables for the latent space algorithm. This are used only if a latent term is included in the model. maxit sets the maximum number of iterations to use in the Quasi-Newton-Rap

returnMCMCstats

If this is TRUE the matrix of change statistics from the MCMC run is returned as component sample. This matrix is actually an object of class mcmc and can be used directly in the CODA package to

randseed

Random number integer seed. The default is sample(10000000, size=1).

verbose

If this is TRUE, we will print out more information as we run the program, including (currently) some goodness of fit statistics.

...

Additional arguments, to be passed to lower-level functions in the future.

Value

ergmm returns an object of class ergmm that is a list. Fits including a latentcluster term will have at least the following components and fits including a latent term will have at least the components up to and including network.
coefThe maximum likelihood estimate the $p$ vector of coefficients for the model parameters (excluding the latent positions and cluster parameters). By default this is just the intercept with $p=1$.
coef.namesA $p$ vector of the coefficient names.
BetaThe MCMCsamplesize$\times p$ matrix of coefficients for the model parameters corresponding to each of the posterior samples. By default this is the intercept only.
ZThe MCMCsamplesize$\times k$ matrix of (Procrustified) posterior positions, where MCMCsamplesize is the sample size and $k$ is the number of dimensions of the latent space.
Z.mklThe network.size(g)$\times$ k matrix of minimum Kullback-Leibler positions for each of the nodes.
Z.pmeanThe network.size(g)$\times$ k matrix of posterior mean positions for each of the nodes.
Z.pmodeThe network.size(g)$\times$ k matrix of posterior modal positions for each of the nodes.
Z.mleThe network.size(g)$\times$ k matrix of MLE positions for each of the nodes.
beta.mklThe $p$ vector of coefficients for the model parameters based on the minimum Kullback-Leibler positions for each of the nodes.
samplesizeThe number of MCMC samples drawn from the posterior.
sampleThe MCMCsamplesize$\times (p+2+k)$ matrix of network statistics, where MCMCsamplesize is the sample size and $p$ is the number of network covariates specified in the model via the latentcov terms (usually 0). The columns are: ``mcmc.loglikelihood", the log-likelihood value; ``density", the constant term in the latent model; the p covariates; ``Z 1", ``Z 2", ..., ``Z k", the k dimensional positions of the first node. The values are recorded for each sample drawn. This is primarily used for MCMC diagnostics to assess convergence.
iterationsThe number of Newton-Raphson iterations required before convergence.
intervalThe number of proposals between sampled statistics.
null.devianceThe deviance for the null model, comparable with -2 loglikelihood. The null model will include the intercept if there is one in the model, but not the latent variables or latent clusters.
mcmc.loglikelihoodThe log-likelihood values corresponding to each of the posterior samples.
loglikelihoodThe log-likelihood for the MLE of positions (and based on the final fits to the other parameters).
mle.likThe log-likelihood for the initial MLE fit of positions.
hessianThe Hessian matrix of the approximated loglikelihood function, evaluated at the maximizer. This matrix may be inverted to give an approximate covariance matrix for the MLE of the parameters.
formulaThe original formula entered into the ergmm function.
latentA flag to indicate that this is a fit of latent variable model. This is always TRUE for ergmm fits and is included for consistency with the statnet package.
clusterA flag to indicate that this is a fit of a latent cluster model. This is always TRUE for ergmm fits if a latentcluster term is in the model and is included for consistency with the statnet package.
networkThe modeled network as an network object.
BICA Bayesian Information Criterion approximation for the model. This is the approximation based on the fully Bayesian estimation method in Section 3.2 of Handcock, Raftery and Tantrum (2005). The formula for the approximation is given at the end of Section 4 in that paper. See the references for details.
classThe vector of posterior modal classes for each node.
KiThe MCMCsamplesize$\times$network.size(g) matrix of posterior draws of the classes, where MCMCsamplesize is the sample size and network.size(g) is the number of nodes in the network.
Ki.mleThe network.size(g) vector of maximum likelihood classes for each node.
logl.lrThe log-likelihood for the latent space component of the model.
logl.mbcThe log-likelihood for the model-based clustering component of the model.
muThe ngroups$\times$k$\times$MCMCsamplesize array of posterior draws of the mean positions of the class, where MCMCsamplesize is the sample size and ngroups is the number of classes.
mu.mleThe ngroups$\times$k matrix of maximum likelihood mean positions for each class.
ngroupsThe number of classes or clusters.
qigThe network.size(g)$\times$ngroups matrix of posterior probabilities of class membership for each of the nodes.
SigmaThe MCMCsamplesize$\times$ngroups array of posterior draws of the variances of the positions of the class, where MCMCsamplesize is the sample size and ngroups is the number of classes.
Sigma.mleThe maximum likelihood variances of the positions for each class.
Note that we have written a function, summary.ergmm that returns a summary of the relevant parts of the ergmm object in concise summary format.

References

Peter D. Hoff, Adrian E. Raftery and Mark S. Handcock. Latent space approaches to social network analysis. Journal of the American Statistical Association, Dec 2002, Vol.97, Iss. 460; pg. 1090-1098.

Mark S. Handcock, Adrian E. Raftery and Jeremy Tantrum. Model-Based Clustering for Social Networks. Working Paper Number 46, Center for Statistics and the Social Sciences, University of Washington, April 2005.

Examples

Run this code

#
# See http://www.csde.washington.edu/statnet/latentnet
# for more examples
#
# For an explanation and examples of creating 'network' objects
# see the required 'network' package.
#
# Use 'data(package = "latentnet")' to list the data sets in a
#
data(package="latentnet")
#
# Using Sampson's Monk data, lets fit a 
# simple latent position model
#
data(sampson)
#
# Get the group labels
#
group <- get.vertex.attribute(samplike,"group")
samp.labs <- substr(group,1,1)
#
samp.fit <- ergmm(samplike ~ latent(k=2), burnin=10000,
                 MCMCsamplesize=2000, interval=30)
#
# See if we have convergence in the MCMC
mcmc.diagnostics(samp.fit)
#
# Plot the fit
#
plot(samp.fit,label=samp.labs, vertex.col="group")
#
# Using Sampson's Monk data, lets fit a latent clustering model
#
samp.fit <- ergmm(samplike ~ latentcluster(k=2, ngroups=3), burnin=10000,
                 MCMCsamplesize=2000, interval=30)
#
# See if we have convergence in the MCMC
mcmc.diagnostics(samp.fit)
#
# Lets look at the goodness of fit:
#
plot(samp.fit,label=samp.labs, vertex.col="group")
plot(samp.fit,pie=TRUE,label=samp.labs)
plot(samp.fit,density=c(2,2))
plot(samp.fit,contours=5,contour.color="red")
plot(samp.fit,density=TRUE,drawarrows=TRUE)
#
# Add contours
#
ergmm.add.contours(samp.fit,nlevels=8,lwd=2)
points(samp.fit$Z.mkl,pch=19,col=samp.fit$class)
#
# Try a covariate on the group
#
samegroup <- outer(group, group, "==")
diag(samegroup) <- 0
samp.fit <- ergmm(samplike ~ latentcov(samegroup) + latent(k=2))
summary(samp.fit)