This function simulates data from the ecological inference model outlined in Greiner \&
Quinn (2009). At the user's option (by setting nprecincts.ep to an
integer greater than 0), the function generates three survey samples
from the simulated dataset. The specifics of the function's operation
are as follows.
First, the function simulates the total number of individual units
(voters) in each contigency table (precinct) from a Poisson
distribution with parameter lambda
* runif(1, dispersion.low.lim,
dispersion.up.lim). Next, for each table, the function simulates the
vector of fraction of units (voters) in each table (precinct) row.
The fractions are simulated from a Dirichlet distribution with
parameter vector housing.seg
* alpha
. The row fractions are
multiplied by the total number of units (voters), and the resulting
vector is rounded to produce contingency table row counts for each
table.
Next, a vector \(mu\) is simulated from a multivariate normal
with mean mu0
and covariance matrix K0
. A covariance
matrix Sigma
is simulated from an Inv-Wishart with
nu0
degrees of freedom and scale matrix Psi0
.
Next, nprecincts
vectors are drawn from \(N(mu,
SIGMA)\). Each of these draws undergoes an inverse-stacked
multidimensional logistic transformation to produce a set of nrowcat
probability vectors (each of which sums to one) for nrowcat
multinomial distributions, one for each row in that contingency
table. Next, the nrowcat
multinomial values, which represent the true (and
in real life, unobserved) internal cell counts, are drawn from the relevant row
counts and these probability vectors. The column totals are
calculated via summation.
If nprecincts.ep
is greater than 0, three simulated surveys (exit polls) are
drawn. All three select contingency tables (precincts) using weights
that are a function of the composition of the row totals. Specifically the row
fractions are raised to a power q and then summed (when q = 2 this calculation is
known in antitrust law as a Herfindahl index). For one of the three
surveys (exit polls) gendata.ep
generates, these
quasi-Herfindahl indices are the weights. For two of the three
surveys (exit polls) gendata.ep
generates, denoted EPInv
and EPReas
, the sample weights are the reciprocals of these
quasi-Herfindhal indices. The former method tends to weight
contingency tables (precincts) in which one row dominates the table
higher than contigency tables (precincts) in which row fractions are close to the
same. In voting parlance, precincts in which one racial group
dominates are more likely to be sampled than racially mixed
precincts. The latter method, in which the sample weights are
reciprocated, weights contingency tables in which row fractions are
similar more highly; in voting parlance, mixed-race precincts are more
likly to be sampled.
For example, suppose nrowcat
= 3, HerInvexp
= 3.5,
HerfReas
= 2, and
HerfNoInv
= 3.5. Consider
contingency table P1 with row counts (300, 300, 300) and contingency
table P2 with row counts (950, 25, 25). Then:
Row fractions: The corresponding row
fractions are (300/900, 300/900, 300/900) = (.33, .33, .33) and
(950/1000, 25/1000, 25/1000) = (.95, .025, .025).
EPInv weights: EPInv
would
sample from assign P1 and P2 weights as follows: \(1/sum(.33^3.5,
.33^3.5, .33^3.5) = 16.1\) and \(1/sum(.95^3.5, .025^3.5, .025^3.5) =
1.2\).
EPReas weights: EPReas
would assign weights as
follows: \(1/sum(.33^2, .33^2, .33^2) = 3.1\) and \(1/sum(.95^2, .025^2,
.025^2) = 1.1\).
EPNoInv weights: EPNoInv
would assign weights as
follows: \(sum(.33^3.5, .33^3.5, .33^3.5) = .062\) and \(sum(.95^3.5,
.025^3.5, .025^3.5) = .84\).
For each of the three simulated surveys (EPInv
, EPReas
,
and EPNoInv
), gendata.ep
returns a list of length
three. The first element of the list, returnmat.ep
, is a matrix of
dimension nprecincts
by (nrowcat
* ncolcat
)
suitable for passing to TuneWithExitPoll
and
AnalyzeWithExitPoll
. That is, the first row of
returnmat.ep
corresponds to the first row of GQdata
,
meaning that they both contain information from the same
contingency table. The second row of returnmat.ep
contains
information from the contingency table represented in the second row
of GQdata
. And so on. In addition, returnmat.ep
has counts
from the sample of the contingency table in vectorized row major
format, as required for TuneWithExitPoll
and
AnalyzeWithExitPoll
.
If nrowcat
= ncolcat
= 3, then the user may set
his.agg.bias.vec
to be nonzero. This will introduce aggregation
bias into the data by making the probability vector of the second row
of each contingency table a function of the fractional composition of
the third row. In voting parlance, if the rows are black, white, and
Hispanic, the white voting behavior will be a function of the percent
Hispanic in each precinct. For example, if his.agg.bias.vec
=
c(1.7, -3), and if the fraction Hispanic in each precinct i is
\(X_hi\), then in the ith precinct, the \(mu_i[3]\)
is set to mu0[3]
+ \(X_hi * 1.7\), while \(mu_i[4]\)
is set to mu0[4]
+ \(X_hi * -3\). This feature
allows testing of the ecological inference model with aggregation
bias.