This function tunes the markov chain monte carlo algorithm used to fit a
hierarchical model to data from two sources: (i) ecological data in which the underlying
contigency tables can have any number of rows or columns, and (ii data
from a survey samlpe of some of the contingency tables. The user
supplies the data and may specify hyperprior values. The function's
primary output is a vector of multipliers, called rhos
, used to
adjust the covariance matrix of the multivariate \(t_4\) distribution used
to propose new values of intermediate-level parameters (denoted
THETAS).
TuneWithExitPoll(fstring, exitpoll, data = NULL, num.runs = 12,
num.iters = 10000, rho.vec = rep(0.05, ntables),
kappa = 10, nu = (mu.dim + 6), psi = mu.dim,
mu.vec.0 = rep(log((0.45/(mu.dim - 1))/0.55), mu.dim),
mu.vec.cu = runif(mu.dim, -3, 0), nolocalmode = 50,
sr.probs = NULL, sr.reps = NULL, numscans = 1,
Diri = 100, dof = 4, debug = 1)
String: model formula of contingency tables' column totals versus row totals. Must be in specified format (an R character string and NOT a true R formula).
Matrix of dimensions \(I\) = number of contingency
tables = number of rows in data
by (R * C) = number of cells
in each contingency table: The results of a survey sample of
some of the contingency tables. Must be in specified format. See Details.
Data frame.
Positive integer: The number of runs or times (each of
num.iters
iterations) the tuning algorthm will be
implemented.
Positive integer: The number of iterations in each run of the tuning algorithm.
Vector of dimension \(I\) = number of contigency
tables = number of rows in data
: initial values of multipliers (usually in
(0,1)) to the covariance matrix of the proposal distribution for
the draws of the intermediate level parameters. The purpose of this
Tune
function is to adjust these values so as to achieve
acceptance ratios of between .2 and .5 in the MCMC draws of the
THETA
s.
Scalar: The diagonal of the covariance matrix for the (normal) hyperprior distribution for the \(\mu\) parameter.
Scalar: The degrees of freedom for the (Inverse-Wishart)
hyperprior distriution for the SIGMA
parameter.
Scalar: The diagonal of the matrix parameter of the
(Inverse-Wishart) hyperprior distribution for the SIGMA
parameter.
Vector: mean of the (normal) hyperprior distribution for the \(\mu\) parameter.
Vector of dimension \(R*(C-1)\), where \(R\)(\(C\)) is the number of rows(columns) in each contigency table: Optional starting values for \(\mu\) parameter.
Positive integer: How often an alternative drawing method for the contigency table internal cell counts will be used. Use of default value recommended.
Matrix of dimension \(I\) x \(R\): Each value
represents the probability of selecting a particular
contingency table's row as the row to be calculated deterministically
in (product multinomial) proposals for Metropolis draws of the
internal cell counts. For example, if R = 3 and row 2 of position
sr.probs
= c(.1, .5, .4), then in the third contingency table
(correspoding to the third row of data
), the proposal
algorithm for the interior cell counts will calculate the third
contingency table's first row deterministically with probability
.1, the second row with probability .5, and the third row with
probability .4. Use of default (generated
internally) recommended.
Matrix of dimension \(I\) x \(R\): Each value represents the number of times the (product multinomial proposal) Metropolis algorithm will be attempted when, in drawing the internal cell counts, the proposal for the corresponding contingency table row is to be calculated deterministically. sr.reps has the same structure as sr.probs, i.e., position [3,1] of sr.reps corresponds to the third contingency table's first row. Use of default (generated internally) recommended.
Positive integer: How often the algorithm to draw the contingency table internal cell counts will be implemented before new values of the other parameters are drawn. Use of default value recommended.
Positive integer: How often a product Dirichlet proposal distribution will be used to draw the contingency table row probability vectors (the THETAS).
Positive integer: The degrees of freedom of the multivariate \(t\) proposal distribution used in drawing the contingency table row probability vectors (the THETAS).
Integer: Akin to verbose
in some packages. If set
to 1, certain status information (including rough notification
regarding the number of iterations completed) will be
written to the screen.
A list with the following elements.
A vector of length I
= number of contingency tables: each
element of the rhos
vector is a multiplier used in the proposal
distribution of for draws from the conditional posterior of the THETAs,
as described above. Feed this vector into the Analyze
function.
Matrix of dimension I
x num.runs
: Each column of
acc.t
contains the acceptance fractions for the Metropolis-Hastings
algorithm, with a multivariate \(t_4\) proposal distribution, used to draw from the conditional posterior of the
THETA
s. If Tune
has worked properly, all elements of the
final column of this matrix should be between .2 and .5.
Matrix of dimension I
x num.runs
: Each column of
acc.t
contains the acceptance fractions for the Metropolis-Hastings
algorithm, with independent Dirichlet proposals, used to draw from the conditional posterior of the
THETA
s. Tune
does not alter this algorithm.
A list of length num.runs
: Each element of
vld.NNs
is a matrix of dimension I
by R
, with
each element of the list corresponding to one of the
num.iters
sets of iterations run by Tune
. To draw from the conditional posterior of
the internal cell counts of a contigency table, the Tune
function
draws R-1 vectors of lenth C from multinomial distributions. In
then calculates the counts in the additional row (denote this row as
r') deterministically. This procedure can result in negative values
in row r', in which case the overall proposal for the interior cell
counts is outside the parameter space (and thus invalid).
Each matrix of vld.NNs keeps track of the percentage of proposals drawn in
this manner that are valid (i.e., not invalid). Each row of
such a matrix corresponds to a
contingency table. Each column in the matrix
corresponds to a row in the a contingency table. Each entry
specifies the percentage of multinomial proposals that are valid
when the specified contingency table row serves as the r' row. For
instance, in position 5,2 of vld.NNs is the fraction of valid
proposals for the 5th contingency table when the second contigency
table row is the r'th row. A value of ``NaN'' means that Tune
chose to use a different (slower) method of drawing the internal
cell counts because it suspected that the multinomial method would
behave badly.
A list of length num.runs
: Same as vld.NNs,
except the entries represent the fraction of proposals accepted (instead of the
fraction that are in the permissible parameter space).
TuneWithExitPoll
is a necessary precursor function to AnalyzeWithExitPoll
, the workhorse
function in fitting the R x C
ecological inference model described in Greiner & Quinn (2009) to
ecological data augmented by data from surveys of some of the
contingency tables. Details and terminology of the basic
(i.e., without a survey sample) data structure and ecological
inference model are discussed in the
documentation accompanying the Analyze
function. The purpose
of TuneWithExitPoll
, as prepatory to
AnalyzeWithExitPoll
, is the same as the purpose of Tune
as prepatory to Analyze
. See the documentation for Tune
for a full explanation.
In the present implementation, the AnalyzeWithExitPoll
, and
thus TuneWithExitPoll
, presume that the survey consisted of a
simple random sample from the in-sample contingency tables.
Future implementations will allow
incorporation of more complicated survey sampling schemes.
The arguments to TuneWithExitPoll
are essentially identical
to those of Tune
with the major exception of
exitpoll
. exitpoll
feeds the results of the survey
sample to the function, and a particular format is required.
Specifically, exitpoll
must have the same number of rows as
data
, meaning one row for each contigency table in the
dataset. It must have R * C columns, meaning one column for each cell
in one of the ecological data's contingency tables. The first row of
exitpoll
must correspond to the first row of data
,
meaning that the two rows must contain information from the same
contingency table. The second row of exitpoll
must contain
information from the contingency table represented in the second row
of data
. And so on. Finally, exitpoll
must have counts
from the sample of the contingency table in vectorized row major
format.
To illustrate with a voting example: Suppose the contingency tables have two rows, labeled
bla and whi, and three columns, denoted Dem, Rep, and Abs.
In other words, the fstring
argument would be "Dem, Rep, Abs ~
bla, whi"
. Suppose there are 100 contingency tables. The data
will be of dimension \(100 \times 5\), with each row consisting of
the row and column totals from that particular contigency table.
exitpoll
will be of dimension \(100 \times 6\). Row 11 of
the exitpoll
will consist of the following: in position 1, the
number of blacks voting Democrat observed in the sample of contingency
table 11; in position 2, the number of blacks voting Republican
observed in the sample of contigency table 11; in position 3, the
number of blacks Abstaining from voting observed in the sample of
contingency table 11; in position 4, the number of whites voting
Democrat observed in the sample of contingency table 11; etc.
For tables in which there was no sample taken (i.e.,
out-of-sample tables), the corresponding row of exitpoll
should
have a vector of 0s.
D. James Greiner \& Kevin M. Quinn. 2009. ``R x C Ecological Inference: Bounds, Correlations, Flexibility, and Transparency of Assumptions.'' J.R. Statist. Soc. A 172:67-81.
# NOT RUN {
SimData <- gendata.ep() # simulated data
FormulaString <- "Dem, Rep, Abs ~ bla, whi, his"
EPInvTune <- TuneWithExitPoll(fstring = FormulaString,
data = SimData$GQdata,
exitpoll=SimData$EPInv$returnmat.ep,
num.iters = 10000,
num.runs = 15)
# }
Run the code above in your browser using DataLab