control.ergm: Auxiliary for Controlling ERGM Fitting

Description

Auxiliary function as user interface for fine-tuning 'ergm' fitting.

Usage

control.ergm(drop=TRUE,
             init=NULL,
             init.method=NULL,
                       
             main.method=c("MCMLE","Robbins-Monro",
               "Stochastic-Approximation","Stepping"),
             force.main=FALSE,
             main.hessian=TRUE,
             MPLE.max.dyad.types=1e+6, 
             MPLE.samplesize=50000,                       
             MPLE.type=c("glm", "penalized"),
                      
             MCMC.prop.weights="default", MCMC.prop.args=list(),
             MCMC.interval=1024,
             MCMC.burnin=MCMC.interval*16,
             MCMC.samplesize=1024,
             
             MCMC.effectiveSize=NULL,
             MCMC.effectiveSize.damp=10,
             MCMC.effectiveSize.maxruns=1000,
             MCMC.effectiveSize.base=1/2,
             MCMC.effectiveSize.points=5,
             MCMC.effectiveSize.order=1,
             
             MCMC.return.stats=TRUE,
             MCMC.runtime.traceplot=FALSE,
             MCMC.init.maxedges=20000,
             MCMC.max.maxedges=Inf,
             MCMC.addto.se=TRUE,
             MCMC.compress=FALSE,
             MCMC.packagenames=c(),            
             SAN.maxit=10,
             SAN.burnin.times=10,
             SAN.control=control.san(coef=init,
                         SAN.prop.weights=MCMC.prop.weights,
                         SAN.prop.args=MCMC.prop.args,
                         SAN.init.maxedges=MCMC.init.maxedges,
                         SAN.burnin=MCMC.burnin * SAN.burnin.times,
                         SAN.interval=MCMC.interval,
                         SAN.packagenames=MCMC.packagenames,
                         MPLE.max.dyad.types=MPLE.max.dyad.types,
                         parallel=parallel,
                         parallel.type=parallel.type,
                         parallel.version.check=parallel.version.check),
             MCMLE.termination=c("Hummel", "Hotelling", "precision", "none"),
             MCMLE.maxit=20,
             MCMLE.conv.min.pval=0.5,
             MCMLE.NR.maxit=100,
             MCMLE.NR.reltol=sqrt(.Machine$double.eps),
             obs.MCMC.samplesize=MCMC.samplesize,
             obs.MCMC.interval=MCMC.interval,
             obs.MCMC.burnin=MCMC.burnin,
             obs.MCMC.burnin.min=obs.MCMC.burnin/10,
             obs.MCMC.prop.weights=MCMC.prop.weights, obs.MCMC.prop.args=MCMC.prop.args,
             MCMLE.check.degeneracy=FALSE,
             MCMLE.MCMC.precision=0.005,
             MCMLE.MCMC.max.ESS.frac=0.1,
             MCMLE.metric=c("lognormal", "logtaylor",
                            "Median.Likelihood",
                            "EF.Likelihood", "naive"),
             MCMLE.method=c("BFGS","Nelder-Mead"),
             MCMLE.trustregion=20,
             MCMLE.dampening=FALSE,
             MCMLE.dampening.min.ess=20,
             MCMLE.dampening.level=0.1,
             MCMLE.steplength.margin=0.05,
             MCMLE.steplength=if(is.null(MCMLE.steplength.margin)) 0.5 else 1,
             MCMLE.adaptive.trustregion=3,
             MCMLE.sequential=TRUE,
             MCMLE.density.guard.min=10000,
             MCMLE.density.guard=exp(3),
             MCMLE.effectiveSize=NULL,
             MCMLE.last.boost=4,
             MCMLE.Hummel.esteq=TRUE,
             MCMLE.Hummel.miss.sample=100,
             MCMLE.Hummel.maxit=25, 
             MCMLE.steplength.min=0.0001,
             SA.phase1_n=NULL, 
             SA.initial_gain=NULL, 
             SA.nsubphases=4, 
             SA.niterations=NULL, 
             SA.phase3_n=NULL,
             SA.trustregion=0.5,
             RM.phase1n_base=7,
             RM.phase2n_base=100,
             RM.phase2sub=7,
             RM.init_gain=0.5,
             RM.phase3n=500,
             Step.MCMC.samplesize=100,
             Step.maxit=50,
             Step.gridsize=100,
             CD.nsteps=8,
             CD.multiplicity=1,
             CD.nsteps.obs=128,
             CD.multiplicity.obs=1,
             CD.maxit=60,
             CD.conv.min.pval=0.5,
             CD.NR.maxit=100,
             CD.NR.reltol=sqrt(.Machine$double.eps),
             CD.metric=c("naive", "lognormal", "logtaylor",
               "Median.Likelihood",
               "EF.Likelihood"),
             CD.method=c("BFGS","Nelder-Mead"),
             CD.trustregion=20,
             CD.dampening=FALSE,
             CD.dampening.min.ess=20,
             CD.dampening.level=0.1,
             CD.steplength.margin=0.5,
             CD.steplength=1,
             CD.adaptive.trustregion=3,
             CD.adaptive.epsilon=0.01,
             CD.Hummel.esteq=TRUE,
             CD.Hummel.miss.sample=100,
             CD.Hummel.maxit=25, 
             CD.steplength.min=0.0001,
             loglik.control=control.logLik.ergm(),
             seed=NULL,
             parallel=0,
             parallel.type=NULL,
             parallel.version.check=TRUE,             
             …)

Arguments

drop

Logical: If TRUE, terms whose observed statistic values are at the extremes of their possible ranges are dropped from the fit and their corresponding parameter estimates are set to plus or minus infinity, as appropriate. This is done because maximum likelihood estimates cannot exist when the vector of observed statistic lies on the boundary of the convex hull of possible statistic values.

init

numeric or NA vector equal in length to the number of parameters in the model or NULL (the default); the initial values for the estimation and coefficient offset terms. If NULL is passed, all of the initial values are computed using the method specified by control$init.method. If a numeric vector is given, the elements of the vector are interpreted as follows:

Elements corresponding to terms enclosed in offset() are used as the fixed offset coefficients. Note that offset coefficients alone can be more conveniently specified using ergm argument offset.coef. If both offset.coef and init arguments are given, values in offset.coef will take precedence.
Elements that do not correspond to offset terms and are not NA are used as starting values in the estimation.
Initial values for the elements that are NA are fit using the method specified by control$init.method.

Passing control.ergm(init=coef(prev.fit)) can be used to ``resume'' an uncoverged ergm run, but see enformulate.curved.

init.method

A chatacter vector or NULL. The default method depends on the reference measure used. For the binary ("Bernoulli") ERGMs, it's maximum pseudo-likelihood estimation (MPLE). Other valid values include "zeros" for a 0 vector of appropriate length and "CD" for contrastive divergence.

Valid initial methods for a given reference can be queried using ergm.init.methods.

main.method

One of "MCMLE" (default),"Robbins-Monro", "Stochastic-Approximation", or "Stepping". Chooses the estimation method used to find the MLE. MCMLE attempts to maximize an approximation to the log-likelihood function. Robbins-Monro and Stochastic-Approximation are both stochastic approximation algorithms that try to solve the method of moments equation that yields the MLE in the case of an exponential family model. Another alternative is a partial stepping algorithm (Stepping) as in Hummel et al. (2012). The direct use of the likelihood function has many theoretical advantages over stochastic approximation, but the choice will depend on the model and data being fit. See Handcock (2000) and Hunter and Handcock (2006) for details.

Note that in recent versions of ERGM, the enhancements of Stepping have been folded into the default MCMLE, which is able to handle more modeling scenarios.

force.main

Logical: If TRUE, then force MCMC-based estimation method, even if the exact MLE can be computed via maximum pseudolikelihood estimation.

main.hessian

Logical: If TRUE, then an approximate Hessian matrix is used in the MCMC-based estimation method.

MPLE.max.dyad.types

Maximum number of unique values of change statistic vectors, which are the predictors in a logistic regression used to calculate the MPLE. This calculation uses a compression algorithm that allocates space based on MPLE.max.dyad.types.

MPLE.samplesize

Not currently documented; used in conditional-on-degree version of MPLE.

MPLE.type

One of "glm" or "penalized". Chooses method of calculating MPLE. "glm" is the usual formal logistic regression, whereas "penalized" uses the bias-reduced method of Firth (1993) as originally implemented by Meinhard Ploner, Daniela Dunkler, Harry Southworth, and Georg Heinze in the "logistf" package.

MCMC.prop.weights, obs.MCMC.prop.weights

Specifies the proposal distribution used in the MCMC Metropolis-Hastings algorithm. Possible choices depending on selected reference and constraints arguments of the ergm function, but often include "TNT" and "random", and the "default" is to use the one with the highest priority available.

The TNT (tie / no tie) option puts roughly equal weight on selecting a dyad with or without a tie as a candidate for toggling, whereas the random option puts equal weight on all possible dyads, though the interpretation of random may change according to the constraints in place. When no constraints are in place, the default is TNT, which appears to improve Markov chain mixing particularly for networks with a low edge density, as is typical of many realistic social networks.

obs.MCMC.prop.weights, if given separately, specifies the weights to be used for the constrained MCMC when missing dyads are present, defaulting to the same as MCMC.prop.weights.

MCMC.prop.args, obs.MCMC.prop.args

An alternative, direct way of specifying additional arguments to proposal. obs.MCMC.prop.args, if given separately, specifies the weights to be used for the constrained MCMC when missing dyads are present, defaulting to the same as MCMC.prop.args.

MCMC.interval

Number of proposals between sampled statistics. Increasing interval will reduces the autocorrelation in the sample, and may increase the precision in estimates by reducing MCMC error, at the expense of time. Set the interval higher for larger networks.

MCMC.burnin

Number of proposals before any MCMC sampling is done. It typically is set to a fairly large number.

MCMC.samplesize

Number of network statistics, randomly drawn from a given distribution on the set of all networks, returned by the Metropolis-Hastings algorithm. Increasing sample size may increase the precision in the estimates by reducing MCMC error, at the expense of time. Set it higher for larger networks, or when using parallel functionality.

MCMLE.effectiveSize, MCMC.effectiveSize, MCMC.effectiveSize.damp, MCMC.effectiveSize.maxruns, MCMC.effectiveSize.base, MCMC.effectiveSize.points, MCMC.effectiveSize.order

Set MCMLE.effectiveSize to non-NULL value to adaptively determine the burn-in and the MCMC length needed to get the specified effective size using the method of Sahlin (2011); 50 is a reasonable value. This feature is in experimental status until we verify the coverage of the standard errors.

MCMC.return.stats

Logical: If TRUE, return the matrix of MCMC-sampled network statistics. This matrix should have MCMC.samplesize rows. This matrix can be used directly by the coda package to assess MCMC convergence.

MCMC.runtime.traceplot

Logical: If TRUE, plot traceplots of the MCMC sample after every MCMC MLE iteration.

MCMC.init.maxedges, MCMC.max.maxedges

Maximum number of edges expected in network. Starting at MCMC.init.maxedges, it will be incremented by a factor of 10 if exceeded during fitting, up to MCMC.max.maxedges, at which point the process will stop with an error.

MCMC.addto.se

Whether to add the standard errors induced by the MCMC algorithm to the estimates' standard errors.

MCMC.compress

Logical: If TRUE, the matrix of sample statistics returned is compressed to the set of unique statistics with a column of frequencies post-pended.

MCMC.packagenames

Names of packages in which to look for change statistic functions in addition to those autodetected. This argument should not be needed outside of very strange setups.

SAN.maxit

When target.stats argument is passed to ergm, the maximum number of attempts to use san to obtain a network with statistics close to those specified.

SAN.burnin.times

Multiplier for SAN.burnin relative to MCMC.burnin. This lets one control the amount of SAN burn-in (arguably, the most important of SAN parameters) without overriding the other SAN.control defaults.

SAN.control

Control arguments to san. See control.san for details.

MCMLE.termination

The criterion used for terminating MCMLE estimation:

"Hummel" Terminate when the Hummel step length is 1 for two consecutive iterations. For the last iteration, the sample size is boosted by a factor of MCMLE.last.boost. See Hummel et. al. (2012).
Note that this criterion is incompatible with MCMLE.steplength $\ne$ 1 or MCMLE.steplength.margin $=$ NULL.
"Hotelling" After every MCMC sample, an autocorrelation-adjusted Hotelling's T^2 test for equality of MCMC-simulated network statistics to observed is conducted, and if its P-value exceeds MCMLE.conv.min.pval, the estimation is considered to have converged and finishes. This was the default option in ergm version 3.1.
"precision" Terminate when the estimated loss in estimating precision due to using MCMC standard errors is below the precision bound specified by MCMLE.MCMC.precision, and the Hummel step length is 1 for two consecutive iterations. See MCMLE.MCMC.precision for details. This feature is in experimental status until we verify the coverage of the standard errors.
Note that this criterion is incompatible with $\code{MCMLE.steplength}\ne 1$ or $\code{MCMLE.steplength.margin}=\code{NULL}$.
"none" Stop after MCMLE.maxit iterations.

MCMLE.maxit

Maximum number of times the parameter for the MCMC should be updated by maximizing the MCMC likelihood. At each step the parameter is changed to the values that maximizes the MCMC likelihood based on the current sample.

MCMLE.conv.min.pval

The P-value used in the Hotelling test for early termination.

MCMLE.NR.maxit, MCMLE.NR.reltol

The method, maximum number of iterations and relative tolerance to use within the optim rountine in the MLE optimization. Note that by default, ergm uses trust, and falls back to optim only when trust fails.

obs.MCMC.samplesize, obs.MCMC.burnin, obs.MCMC.interval, obs.MCMC.burnin.min

Sample size, burnin, and interval parameters for the MCMC sampling used when unobserved data are present in the estimation routine.

MCMLE.check.degeneracy

Logical: If TRUE, employ a check for model degeneracy.

MCMLE.MCMC.precision, MCMLE.MCMC.max.ESS.frac

MCMLE.MCMC.precision is a vector of upper bounds on the standard errors induced by the MCMC algorithm, expressed as a percentage of the total standard error. The MCMLE algorithm will terminate when the MCMC standard errors are below the precision bound, and the Hummel step length is 1 for two consecutive iterations. This is an experimental feature.

If effective sample size is used (see MCMC.effectiveSize), then ergm may increase the target ESS to reduce the MCMC standard error.

MCMLE.metric

Method to calculate the loglikelihood approximation. See Hummel et al (2010) for an explanation of "lognormal" and "naive".

MCMLE.method

Deprecated. By default, ergm uses trust, and falls back to optim with Nelder-Mead method when trust fails.

MCMLE.trustregion

Maximum increase the algorithm will allow for the approximated likelihood at a given iteration. See Snijders (2002) for details.

Note that not all metrics abide by it.

MCMLE.dampening

(logical) Should likelihood dampening be used?

MCMLE.dampening.min.ess

The effective sample size below which dampening is used.

MCMLE.dampening.level

The proportional distance from boundary of the convex hull move.

MCMLE.steplength.margin

The extra margin required for a Hummel step to count as being inside the convex hull of the sample. Set this to 0 if the step length gets stuck at the same value over several iteraions. Set it to NULL to use fixed step length. Note that this parameter is required to be non-NULL for MCMLE termination using Hummel or precision criteria.

MCMLE.steplength

Multiplier for step length, which may (for values less than one) make fitting more stable at the cost of computational efficiency. Can be set to "adaptive"; see MCMLE.adaptive.trustregion.

If MCMLE.steplength.margin is not NULL, the step length will be set using the algorithm of Hummel et al. (2010). In that case, it will serve as the maximum step length considered. However, setting it to anything other than 1 will preclude using Hummel or precision as termination criteria.

MCMLE.adaptive.trustregion

Maximum increase the algorithm will allow for the approximated loglikelihood at a given iteration when MCMLE.steplength="adaptive".

MCMLE.sequential

Logical: If TRUE, the next iteration of the fit uses the last network sampled as the starting network. If FALSE, always use the initially passed network. The results should be similar (stochastically), but the TRUE option may help if the target.stats in the ergm function are far from the initial network.

MCMLE.density.guard.min, MCMLE.density.guard

A simple heuristic to stop optimization if it finds itself in an overly dense region, which usually indicates ERGM degeneracy: if the sampler encounters a network configuration that has more than MCMLE.density.guard.min edges and whose number of edges is exceeds the observed network by more than MCMLE.density.guard, the optimization process will be stopped with an error.

MCMLE.last.boost

For the Hummel termination criterion, increase the MCMC sample size of the last iteration by this factor.

MCMLE.Hummel.esteq

For curved ERGMs, should the estimating function values be used to compute the Hummel step length? This allows the Hummel stepping algorithm converge when some sufficient statistics are at 0.

MCMLE.steplength.min

Stops MCMLE estimation when the step length gets stuck below this minimum value.

MCMLE.Hummel.miss.sample

In fitting the missing data MLE, the rules for step length become more complicated. In short, it is necessary for all points in the constrained sample to be in the convex hull of the unconstrained (though they may be on the border); and it is necessary for their centroid to be in its interior. This requires checking a large number of points against whether they are in the convex hull, so to speed up the procedure, a sample is taken of the points most likely to be outside it. This parameter specifies the sample size.

MCMLE.Hummel.maxit

Maximum number of iterations in searching for the best step length.

SA.phase1_n

Number of MCMC samples to draw in Phase 1 of the stochastic approximation algorithm. Defaults to 7 plus 3 times the number of terms in the model. See Snijders (2002) for details.

SA.initial_gain

Initial gain to Phase 2 of the stochastic approximation algorithm. See Snijders (2002) for details.

SA.nsubphases

Number of sub-phases in Phase 2 of the stochastic approximation algorithm. Defaults to MCMLE.maxit. See Snijders (2002) for details.

SA.niterations

Number of MCMC samples to draw in Phase 2 of the stochastic approximation algorithm. Defaults to 7 plus the number of terms in the model. See Snijders (2002) for details.

SA.phase3_n

Sample size for the MCMC sample in Phase 3 of the stochastic approximation algorithm. See Snijders (2002) for details.

SA.trustregion

The trust region parameter for the likelihood functions, used in the stochastic approximation algorithm.

RM.phase1n_base, RM.phase2n_base, RM.phase2sub, RM.init_gain, RM.phase3n

The Robbins-Monro control parameters are not yet documented.

Step.MCMC.samplesize

MCMC sample size for the preliminary steps of the "Stepping" method of optimization. This is usually chosen to be smaller than the final MCMC sample size (which equals MCMC.samplesize). See Hummel et al. (2012) for details.

Step.maxit

Maximum number of iterations (steps) allowed by the "Stepping" method.

Step.gridsize

Integer $N$ such that the "Stepping" style of optimization chooses a step length equal to the largest possible multiple of $1/N$. See Hummel et al. (2012) for details.

CD.nsteps, CD.multiplicity

Main settings for contrastive divergence to obtain initial values for the estimation: respectively, the number of Metropolis--Hastings steps to take before reverting to the starting value and the number of tentative proposals per step. Computational experiments indicate that increasing CD.multiplicity improves the estimate faster than increasing CD.nsteps --- up to a point --- but it also samples from the wrong distribution, in the sense that while as CD.nsteps$\rightarrow\infty$, the CD estimate approaches the MLE, this is not the case for CD.multiplicity.

In practice, MPLE, when available, usually outperforms CD for even a very high CD.nsteps (which is, in turn, not very stable), so CD is useful primarily when MPLE is not available. This feature is to be considered experimental and in flux.

The default values have been set experimentally, providing a reasonably stable, if not great, starting values.

CD.nsteps.obs, CD.multiplicity.obs

When there are missing dyads, CD.nsteps and CD.multiplicity must be set to a relatively high value, as the network passed is not necessarily a good start for CD. Therefore, these settings are in effect if there are missing dyads in the observed network, using a higher default number of steps.

CD.maxit, CD.conv.min.pval, CD.NR.maxit, CD.NR.reltol, CD.metric, CD.method, CD.trustregion, CD.dampening, CD.dampening.min.ess, CD.dampening.level, CD.steplength.margin, CD.steplength, CD.adaptive.trustregion, CD.adaptive.epsilon, CD.Hummel.e

Miscellaneous tuning parameters of the CD sampler and optimizer. These have the same meaning as their MCMC.* counterparts.

Note that only the Hotelling's stopping criterion is implemented for CD.

loglik.control

See control.ergm.bridge

seed

Seed value (integer) for the random number generator. See set.seed

parallel

Number of threads in which to run the sampling. Defaults to 0 (no parallelism). See the entry on parallel processing for details and troubleshooting.

parallel.type

API to use for parallel processing. Supported values are "MPI" and "PSOCK". Defaults to using the parallel package with PSOCK clusters. See ergm-parallel

parallel.version.check

Logical: If TRUE, check that the version of ergm running on the slave nodes is the same as that running on the master node.

…

Additional arguments, passed to other functions This argument is helpful because it collects any control parameters that have been deprecated; a warning message is printed in case of deprecated arguments.

Value

A list with arguments as components.

Details

This function is only used within a call to the ergm function. See the usage section in ergm for details.

References

Snijders, T.A.B. (2002), Markov Chain Monte Carlo Estimation of Exponential Random Graph Models. Journal of Social Structure. Available from http://www.cmu.edu/joss/content/articles/volume3/Snijders.pdf.
Firth (1993), Bias Reduction in Maximum Likelihood Estimates. Biometrika, 80: 27-38.
Hunter, D. R. and M. S. Handcock (2006), Inference in curved exponential family models for networks. Journal of Computational and Graphical Statistics, 15: 565-583.
Hummel, R. M., Hunter, D. R., and Handcock, M. S. (2012), Improving Simulation-Based Algorithms for Fitting ERGMs, Journal of Computational and Graphical Statistics, 21: 920-939.
Kristoffer Sahlin. Estimating convergence of Markov chain Monte Carlo simulations. Master's Thesis. Stockholm University, 2011. http://www2.math.su.se/matstat/reports/master/2011/rep2/report.pdf