redist_smc: SMC Redistricting Sampler

Description

redist_smc uses a Sequential Monte Carlo algorithm to generate nearly independent congressional or legislative redistricting plans according to contiguity, population, compactness, and administrative boundary constraints.

Usage

redist_smc(
  map,
  nsims,
  counties = NULL,
  compactness = 1,
  constraints = list(),
  resample = TRUE,
  constraint_fn = function(m) rep(0, ncol(m)),
  adapt_k_thresh = 0.975,
  seq_alpha = 0.2 + 0.3 * compactness,
  truncate = (compactness != 1),
  trunc_fn = redist_quantile_trunc,
  pop_temper = 0,
  ref_name = NULL,
  verbose = TRUE,
  silent = FALSE
)
redist.smc(
  adj,
  total_pop,
  nsims,
  ndists,
  counties = NULL,
  pop_tol = 0.01,
  pop_bounds = NULL,
  compactness = 1,
  constraints = list(),
  resample = TRUE,
  constraint_fn = function(m) rep(0, ncol(m)),
  adapt_k_thresh = 0.975,
  seq_alpha = 0.2 + 0.2 * compactness,
  truncate = (compactness != 1),
  trunc_fn = function(x) pmin(x, 0.01 * nsims^0.4),
  pop_temper = 0,
  verbose = TRUE,
  silent = FALSE
)

Arguments

map

A redist_map object.

nsims

The number of samples to draw.

counties

A vector containing county (or other administrative or geographic unit) labels for each unit, which may be integers ranging from 1 to the number of counties, or a factor or character vector. If provided, the algorithm will only generate maps which split up to ndists-1 counties. If no county-split constraint is desired, this parameter should be left blank.

compactness

Controls the compactness of the generated districts, with higher values preferring more compact districts. Must be nonnegative. See the 'Details' section for more information, and computational considerations.

constraints

A list containing information on constraints to implement. See the 'Details' section for more information.

resample

Whether to perform a final resampling step so that the generated plans can be used immediately. Set this to FALSE to perform direct importance sampling estimates, or to adjust the weights manually.

constraint_fn

A function which takes in a matrix where each column is a redistricting plan and outputs a vector of log-weights, which will be added the the final weights.

adapt_k_thresh

The threshold value used in the heuristic to select a value k_i for each splitting iteration. Set to 0.9999 or 1 if the algorithm does not appear to be sampling from the target distribution. Must be between 0 and 1.

seq_alpha

The amount to adjust the weights by at each resampling step; higher values prefer exploitation, while lower values prefer exploration. Must be between 0 and 1.

truncate

Whether to truncate the importance sampling weights at the final step by trunc_fn. Recommended if compactness is not 1. Truncation only applied if resample=TRUE.

trunc_fn

A function which takes in a vector of weights and returns a truncated vector. If loo package is installed (strongly recommended), will default to Pareto-smoothed Importance Sampling (PSIS) rather than naive truncation.

pop_temper

The strength of the automatic population tempering. Try values of 0.01-0.05 to start if the algorithm gets stuck on the final few splits.

ref_name

a name for the existing plan, which will be added as a reference plan, or FALSE to not include the initial plan in the output. Defaults to the column name of the existing plan.

verbose

Whether to print out intermediate information while sampling. Recommended.

silent

Whether to suppress all diagnostic information.

adj

An adjacency matrix, list, or object of class "SpatialPolygonsDataFrame."

total_pop

A vector containing the populations of each geographic unit.

ndists

The number of districts in each redistricting plan.

pop_tol

The desired population constraint. All sampled districts will have a deviation from the target district size no more than this value in percentage terms, i.e., pop_tol=0.01 will ensure districts have populations within 1% of the target population.

pop_bounds

A numeric vector with three elements c(lower, target, upper) providing more precise population bounds for the algorithm. Districts will have population between lower and upper, with a goal of target. If set, overrides pop_tol.

Value

redist_smc returns an object of class redist_plans containing the simulated plans.

redist.smc (Deprecated) returns an object of class redist, which is a list containing the following components:

aList

The adjacency list used to sample

cdvec

The matrix of sampled plans. Each row is a geographical unit, and each column is a sample.

wgt

The importance sampling weights, normalized to sum to 1.

orig_wgt

The importance sampling weights before resampling or truncation, normalized to have mean 1.

nsims

The number of plans sampled.

pct_dist_parity

The population constraint.

compactness

The compactness constraint.

counties

The computed constraint options list (see above).

maxdev

The maximum population deviation of each sample.

total_pop

The provided vector of unit populations.

counties

The provided county vector.

adapt_k_thresh

The provided control parameter.

seq_alpha

The provided control vector.

algorithm

The algorithm used, here "smc".

Details

This function draws nearly-independent samples from a specific target measure, controlled by the pop_tol, compactness, constraints, and constraint_fn parameters.

Key to ensuring good performance is monitoring the efficiency of the resampling process at each SMC stage. Unless silent=FALSE, this function will print out the effective sample size of each resampling step to allow the user to monitor the efficiency. If verbose=TRUE the function will also print out information on the \(k_i\) values automatically chosen and the acceptance rate (based on the population constraint) at each step.

Higher values of compactness sample more compact districts; setting this parameter to 1 is computationally efficient and generates nicely compact districts. Values of other than 1 may lead to highly variable importance sampling weights. By default these weights are truncated using redist_quantile_trunc to stabilize the resulting estimates, but if truncation is used, a specific truncation function should probably be chosen by the user.

The constraints parameter allows the user to apply several common redistricting constraints without implementing them by hand. This parameter is a list, which may contain any of the following named entries:

status_quo: a list with two entries:
- strength, a number controlling the tendency of the generated districts to respect the status quo, with higher values preferring more similar districts.
- current, a vector containing district assignments for the current map.
hinge: a list with three entries:
- strength, a number controlling the strength of the constraint, with higher values prioritizing districts with group populations at least tgts_min over other considerations.
- tgts_min, the target percentage(s) of minority voters in minority opportunity districts. Defaults to c(0.55).
- min_pop, A vector containing the minority population of each geographic unit.
incumbency: a list with two entries:
- strength, a number controlling the tendency of the generated districts to avoid pairing up incumbents.
- incumbents, a vector of precinct indices, one for each incumbent's home address.
vra: a list with five entries, which may be set up using redist.constraint.helper:
- strength, a number controlling the strength of the Voting Rights Act (VRA) constraint, with higher values prioritizing majority-minority districts over other considerations.
- tgt_vra_min, the target percentage of minority voters in minority opportunity districts. Defaults to 0.55.
- tgt_vra_other The target percentage of minority voters in other districts. Defaults to 0.25, but should be set to reflect the total minority population in the state.
- pow_vra, which controls the allowed deviation from the target minority percentage; higher values are more tolerant. Defaults to 1.5
- min_pop, A vector containing the minority population of each geographic unit.
multisplits: a list with one entry:
- strength, a number controlling the tendency of the generated districts to avoid splitting counties multiple times.

All constraints are fed into a Gibbs measure, with coefficients on each constraint set by the corresponding strength parameters. The strength can be any real number, with zero corresponding to no constraint. The status_quo constraint adds a term measuring the variation of information distance between the plan and the reference, rescaled to [0, 1]. The hinge constraint takes a list of target minority percentages. It matches each district to its nearest target percentage, and then applies a penalty of the form \(\sqrt{max(0, tgt - minpct)}\), summing across districts. This penalizes districts which are below their target population. The incumbency constraint adds a term counting the number of districts containing paired-up incumbents. The vra constraint (not recommended) adds a term of the form \((|tgtvramin-minpct||tgtvraother-minpct|)^{powvra})\), which encourages districts to have minority percentages near either tgt_vra_min or tgt_vra_other. This can be visualized with redist.plot.penalty.

References

McCartan, C., & Imai, K. (2020). Sequential Monte Carlo for Sampling Balanced and Compact Redistricting Plans. Available at https://imai.fas.harvard.edu/research/files/SMCredist.pdf.

Examples

Run this code

# NOT RUN {
set.seed(1)
data(fl25)

fl_map = redist_map(fl25, ndists=3, pop_tol=0.1)

sampled_basic = redist_smc(fl_map, 10000)

sampled_constr = redist_smc(fl_map, 10000, constraints=list(
                                incumbency = list(strength=100, incumbents=c(3, 6, 25))
                            ))
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab