redist.smc: SMC Redistricting Sampler

Description

redist.smc uses a Sequential Monte Carlo algorithm to generate nearly independent congressional or legislative redistricting plans according to contiguity, population, compactness, and administrative boundary constraints.

Usage

redist.smc(
  adjobj,
  popvec,
  nsims,
  ndists,
  counties = NULL,
  popcons = 0.01,
  compactness = 1,
  resample = TRUE,
  constraint_fn = function(m) rep(0, ncol(m)),
  adapt_k_thresh = 0.95,
  seq_alpha = 0.1 + 0.2 * compactness,
  truncate = (compactness != 1),
  trunc_fn = function(x) pmin(x, 0.01 * nsims^0.4),
  max_oversample = 20,
  verbose = TRUE,
  silent = FALSE
)

Arguments

adjobj

An adjacency matrix, list, or object of class "SpatialPolygonsDataFrame."

popvec

A vector containing the populations of each geographic unit.

nsims

The number of samples to draw.

ndists

The number of districts in each redistricting plan.

counties

A vector containing county (or other administrative or geographic unit) labels for each unit, which must be integers ranging from 1 to the number of counties. If provided, the algorithm will only generate maps which split up to ndists-1 counties. If no county-split constraint is desired, this parameter should be left blank.

popcons

The desired population constraint. All sampled districts will have a deviation from the target district size no more than this value in percentage terms, i.e., popcons=0.01 will ensure districts have populations within 1% of the target population.

compactness

Controls the compactness of the generated districts, with higher values preferring more compact districts. Must be nonnegative. See the 'Details' section for more information, and computational considerations.

resample

Whether to perform a final resampling step so that the generated plans can be used immediately. Set this to FALSE to perform direct importance sampling estimates, or to adjust the weights manually.

constraint_fn

A function which takes in a matrix where each column is a redistricting plan and outputs a vector of log-weights, which will be added the the final weights.

adapt_k_thresh

The threshold value used in the heuristic to select a value k_i for each splitting iteration. Set to 0.9999 or 1 if the algorithm does not appear to be sampling from the target distribution. Must be between 0 and 1.

seq_alpha

The amount to adjust the weights by at each resampling step; higher values prefer exploitation, while lower values prefer exploration. Must be between 0 and 1.

truncate

Whether to truncate the importance sampling weights at the final step by trunc_fn. Recommended if compactness is not 1.

trunc_fn

A function which takes in a vector of weights and returns a truncated vector. Recommended to specify this manually if truncating weights.

max_oversample

How much oversampling to allow at each stage; used to control memory and computation time. If the algorithm is not producing the desired nubmer of samples, this should be increased.

verbose

Whether to print out intermediate information while sampling. Recommended.

silent

Whether to supress all diagnostic information.

Value

redist.smc returns an object of class redist, which is a list containing the following components:

aList

The adjacency list used to sample

cdvec

The matrix of sampled plans. Each row is a geographical unit, and each column is a sample.

wgt

The importance sampling weights, normalized to sum to 1.

nsims

The number of plans sampled.

pct_dist_parity

The population constraint.

compactness

The compactness constraint.

maxdev

The maximum population deviation of each sample.

popvec

The provided vector of unit populations.

counties

The provided county vector.

adapt_k_thresh

The provided control parameter.

seq_alpha

The provided control vector.

max_oversample

The provided control vector.

algorithm

The algorithm used, here "smc".

Details

This function draws nearly-independent samples from a specific target measure, controlled by the popcons, compactness, and constraint_fn parameters.

Higher values of compactness sample more compact districts; setting this parameter to 1 is computationally efficient and generates nicely compact districts. Values of other than 1 may lead to highly variable importance sampling weights. By default these weights are truncated at nsims^0.04 / 100 to stabilize the resulting estimates, but if truncation is used, a specific truncation function should probably be chosen by the user.

Because of the randomness inherent in the algorithm and the way it samples, this function is not guaranteed to produce exactly nsims samples. Failure to do so is usually a result of a hard-to-meet population constraint, especially when there are many districts. Increasing max_oversample should generally alleviate this problem.

Examples

Run this code

# NOT RUN {
data(algdat.p10)
sampled_plans = redist.smc(algdat.pfull$adjlist, algdat.pfull$precinct.data$pop,
                           nsims=10000, ndists=3, popcons=0.1)
# }

Run the code above in your browser using DataLab