redist_smc
uses a Sequential Monte Carlo algorithm to
generate nearly independent congressional or legislative redistricting
plans according to contiguity, population, compactness, and administrative
boundary constraints.
redist_smc(
map,
nsims,
counties = NULL,
compactness = 1,
constraints = list(),
resample = TRUE,
constraint_fn = function(m) rep(0, ncol(m)),
adapt_k_thresh = 0.975,
seq_alpha = 0.2 + 0.3 * compactness,
truncate = (compactness != 1),
trunc_fn = redist_quantile_trunc,
pop_temper = 0,
ref_name = NULL,
verbose = TRUE,
silent = FALSE
)redist.smc(
adj,
total_pop,
nsims,
ndists,
counties = NULL,
pop_tol = 0.01,
pop_bounds = NULL,
compactness = 1,
constraints = list(),
resample = TRUE,
constraint_fn = function(m) rep(0, ncol(m)),
adapt_k_thresh = 0.975,
seq_alpha = 0.2 + 0.2 * compactness,
truncate = (compactness != 1),
trunc_fn = function(x) pmin(x, 0.01 * nsims^0.4),
pop_temper = 0,
verbose = TRUE,
silent = FALSE
)
A redist_map
object.
The number of samples to draw.
A vector containing county (or other administrative or
geographic unit) labels for each unit, which may be integers ranging from 1
to the number of counties, or a factor or character vector. If provided,
the algorithm will only generate maps which split up to ndists-1
counties. If no county-split constraint is desired, this parameter should
be left blank.
Controls the compactness of the generated districts, with higher values preferring more compact districts. Must be nonnegative. See the 'Details' section for more information, and computational considerations.
A list containing information on constraints to implement. See the 'Details' section for more information.
Whether to perform a final resampling step so that the
generated plans can be used immediately. Set this to FALSE
to
perform direct importance sampling estimates, or to adjust the weights
manually.
A function which takes in a matrix where each column is a redistricting plan and outputs a vector of log-weights, which will be added the the final weights.
The threshold value used in the heuristic to select a
value k_i
for each splitting iteration. Set to 0.9999 or 1 if the
algorithm does not appear to be sampling from the target distribution. Must
be between 0 and 1.
The amount to adjust the weights by at each resampling step; higher values prefer exploitation, while lower values prefer exploration. Must be between 0 and 1.
Whether to truncate the importance sampling weights at the
final step by trunc_fn
. Recommended if compactness
is not 1.
Truncation only applied if resample=TRUE
.
A function which takes in a vector of weights and returns a
truncated vector. If loo
package is installed (strongly
recommended), will default to Pareto-smoothed Importance Sampling (PSIS)
rather than naive truncation.
The strength of the automatic population tempering. Try values of 0.01-0.05 to start if the algorithm gets stuck on the final few splits.
a name for the existing plan, which will be added as a
reference plan, or FALSE
to not include the initial plan in the
output. Defaults to the column name of the existing plan.
Whether to print out intermediate information while sampling. Recommended.
Whether to suppress all diagnostic information.
An adjacency matrix, list, or object of class "SpatialPolygonsDataFrame."
A vector containing the populations of each geographic unit.
The number of districts in each redistricting plan.
The desired population constraint. All sampled districts
will have a deviation from the target district size no more than this value
in percentage terms, i.e., pop_tol=0.01
will ensure districts have
populations within 1% of the target population.
A numeric vector with three elements c(lower, target, upper)
providing more precise population bounds for the algorithm. Districts
will have population between lower
and upper
, with a goal of
target
. If set, overrides pop_tol
.
redist_smc
returns an object of class
redist_plans
containing the simulated plans.
redist.smc
(Deprecated) returns an object of class redist
, which
is a list containing the following components:
The adjacency list used to sample
The matrix of sampled plans. Each row is a geographical unit, and each column is a sample.
The importance sampling weights, normalized to sum to 1.
The importance sampling weights before resampling or truncation, normalized to have mean 1.
The number of plans sampled.
The population constraint.
The compactness constraint.
The computed constraint options list (see above).
The maximum population deviation of each sample.
The provided vector of unit populations.
The provided county vector.
The provided control parameter.
The provided control vector.
The algorithm used, here "smc"
.
This function draws nearly-independent samples from a specific target measure,
controlled by the pop_tol
, compactness
, constraints
, and
constraint_fn
parameters.
Key to ensuring good performance is monitoring the efficiency of the resampling
process at each SMC stage. Unless silent=FALSE
, this function will print
out the effective sample size of each resampling step to allow the user to
monitor the efficiency. If verbose=TRUE
the function will also print
out information on the \(k_i\) values automatically chosen and the
acceptance rate (based on the population constraint) at each step.
Higher values of compactness
sample more compact districts;
setting this parameter to 1 is computationally efficient and generates nicely
compact districts. Values of other than 1 may lead to highly variable
importance sampling weights. By default these weights are truncated using
redist_quantile_trunc
to stabilize the resulting estimates, but
if truncation is used, a specific truncation function should probably be
chosen by the user.
The constraints
parameter allows the user to apply several common
redistricting constraints without implementing them by hand. This parameter
is a list, which may contain any of the following named entries:
status_quo
: a list with two entries:
strength
, a number controlling the tendency of the generated districts
to respect the status quo, with higher values preferring more similar
districts.
current
, a vector containing district assignments for
the current map.
hinge
: a list with three entries:
strength
, a number controlling the strength of the constraint, with
higher values prioritizing districts with group populations at least
tgts_min
over other considerations.
tgts_min
, the target percentage(s) of minority voters in minority
opportunity districts. Defaults to c(0.55)
.
min_pop
, A vector containing the minority population of each
geographic unit.
incumbency
: a list with two entries:
strength
, a number controlling the tendency of the generated districts
to avoid pairing up incumbents.
incumbents
, a vector of precinct indices, one for each incumbent's
home address.
vra
: a list with five entries, which may be set up using
redist.constraint.helper
:
strength
, a number controlling the strength of the Voting Rights Act
(VRA) constraint, with higher values prioritizing majority-minority districts
over other considerations.
tgt_vra_min
, the target percentage of minority voters in minority
opportunity districts. Defaults to 0.55.
tgt_vra_other
The target percentage of minority voters in other
districts. Defaults to 0.25, but should be set to reflect the total minority
population in the state.
pow_vra
, which controls the allowed deviation from the target
minority percentage; higher values are more tolerant. Defaults to 1.5
min_pop
, A vector containing the minority population of each
geographic unit.
multisplits
: a list with one entry:
strength
, a number controlling the tendency of the generated districts
to avoid splitting counties multiple times.
All constraints are fed into a Gibbs measure, with coefficients on each
constraint set by the corresponding strength
parameters.
The strength can be any real number, with zero corresponding to no constraint.
The status_quo
constraint adds a term measuring the variation of
information distance between the plan and the reference, rescaled to [0, 1].
The hinge
constraint takes a list of target minority percentages. It
matches each district to its nearest target percentage, and then applies a
penalty of the form \(\sqrt{max(0, tgt - minpct)}\), summing across
districts. This penalizes districts which are below their target population.
The incumbency
constraint adds a term counting the number of districts
containing paired-up incumbents.
The vra
constraint (not recommended) adds a term of the form
\((|tgtvramin-minpct||tgtvraother-minpct|)^{powvra})\), which
encourages districts to have minority percentages near either tgt_vra_min
or tgt_vra_other
. This can be visualized with
redist.plot.penalty
.
McCartan, C., & Imai, K. (2020). Sequential Monte Carlo for Sampling Balanced and Compact Redistricting Plans. Available at https://imai.fas.harvard.edu/research/files/SMCredist.pdf.
McCartan, C., & Imai, K. (2020). Sequential Monte Carlo for Sampling Balanced and Compact Redistricting Plans. Available at https://imai.fas.harvard.edu/research/files/SMCredist.pdf.
# NOT RUN {
set.seed(1)
data(fl25)
fl_map = redist_map(fl25, ndists=3, pop_tol=0.1)
sampled_basic = redist_smc(fl_map, 10000)
sampled_constr = redist_smc(fl_map, 10000, constraints=list(
incumbency = list(strength=100, incumbents=c(3, 6, 25))
))
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab