Learn R Programming

RxCEcolInf (version 0.1-5)

TuneWithExitPoll: Tuning Function for Ecological Inference for Sets of R x C Contingency Tables When Incorporating a Survey such as an Exit Poll

Description

This function tunes the markov chain monte carlo algorithm used to fit a hierarchical model to data from two sources: (i) ecological data in which the underlying contigency tables can have any number of rows or columns, and (ii data from a survey samlpe of some of the contingency tables. The user supplies the data and may specify hyperprior values. The function's primary output is a vector of multipliers, called rhos, used to adjust the covariance matrix of the multivariate \(t_4\) distribution used to propose new values of intermediate-level parameters (denoted THETAS).

Usage

TuneWithExitPoll(fstring, exitpoll, data = NULL, num.runs = 12, 
                 num.iters = 10000, rho.vec = rep(0.05, ntables), 
                 kappa = 10, nu = (mu.dim + 6), psi = mu.dim, 
                 mu.vec.0 = rep(log((0.45/(mu.dim - 1))/0.55), mu.dim), 
                 mu.vec.cu = runif(mu.dim, -3, 0), nolocalmode = 50,  
                 sr.probs = NULL, sr.reps = NULL, numscans = 1, 
                 Diri = 100, dof = 4, debug = 1)

Arguments

fstring

String: model formula of contingency tables' column totals versus row totals. Must be in specified format (an R character string and NOT a true R formula).

exitpoll

Matrix of dimensions \(I\) = number of contingency tables = number of rows in data by (R * C) = number of cells in each contingency table: The results of a survey sample of some of the contingency tables. Must be in specified format. See Details.

data

Data frame.

num.runs

Positive integer: The number of runs or times (each of num.iters iterations) the tuning algorthm will be implemented.

num.iters

Positive integer: The number of iterations in each run of the tuning algorithm.

rho.vec

Vector of dimension \(I\) = number of contigency tables = number of rows in data: initial values of multipliers (usually in (0,1)) to the covariance matrix of the proposal distribution for the draws of the intermediate level parameters. The purpose of this Tune function is to adjust these values so as to achieve acceptance ratios of between .2 and .5 in the MCMC draws of the THETAs.

kappa

Scalar: The diagonal of the covariance matrix for the (normal) hyperprior distribution for the \(\mu\) parameter.

nu

Scalar: The degrees of freedom for the (Inverse-Wishart) hyperprior distriution for the SIGMA parameter.

psi

Scalar: The diagonal of the matrix parameter of the (Inverse-Wishart) hyperprior distribution for the SIGMA parameter.

mu.vec.0

Vector: mean of the (normal) hyperprior distribution for the \(\mu\) parameter.

mu.vec.cu

Vector of dimension \(R*(C-1)\), where \(R\)(\(C\)) is the number of rows(columns) in each contigency table: Optional starting values for \(\mu\) parameter.

nolocalmode

Positive integer: How often an alternative drawing method for the contigency table internal cell counts will be used. Use of default value recommended.

sr.probs

Matrix of dimension \(I\) x \(R\): Each value represents the probability of selecting a particular contingency table's row as the row to be calculated deterministically in (product multinomial) proposals for Metropolis draws of the internal cell counts. For example, if R = 3 and row 2 of position sr.probs = c(.1, .5, .4), then in the third contingency table (correspoding to the third row of data), the proposal algorithm for the interior cell counts will calculate the third contingency table's first row deterministically with probability .1, the second row with probability .5, and the third row with probability .4. Use of default (generated internally) recommended.

sr.reps

Matrix of dimension \(I\) x \(R\): Each value represents the number of times the (product multinomial proposal) Metropolis algorithm will be attempted when, in drawing the internal cell counts, the proposal for the corresponding contingency table row is to be calculated deterministically. sr.reps has the same structure as sr.probs, i.e., position [3,1] of sr.reps corresponds to the third contingency table's first row. Use of default (generated internally) recommended.

numscans

Positive integer: How often the algorithm to draw the contingency table internal cell counts will be implemented before new values of the other parameters are drawn. Use of default value recommended.

Diri

Positive integer: How often a product Dirichlet proposal distribution will be used to draw the contingency table row probability vectors (the THETAS).

dof

Positive integer: The degrees of freedom of the multivariate \(t\) proposal distribution used in drawing the contingency table row probability vectors (the THETAS).

debug

Integer: Akin to verbose in some packages. If set to 1, certain status information (including rough notification regarding the number of iterations completed) will be written to the screen.

Value

A list with the following elements.

rhos

A vector of length I = number of contingency tables: each element of the rhos vector is a multiplier used in the proposal distribution of for draws from the conditional posterior of the THETAs, as described above. Feed this vector into the Analyze function.

acc.t

Matrix of dimension I x num.runs: Each column of acc.t contains the acceptance fractions for the Metropolis-Hastings algorithm, with a multivariate \(t_4\) proposal distribution, used to draw from the conditional posterior of the THETAs. If Tune has worked properly, all elements of the final column of this matrix should be between .2 and .5.

acc.Diri

Matrix of dimension I x num.runs: Each column of acc.t contains the acceptance fractions for the Metropolis-Hastings algorithm, with independent Dirichlet proposals, used to draw from the conditional posterior of the THETAs. Tune does not alter this algorithm.

vld.NNs

A list of length num.runs: Each element of vld.NNs is a matrix of dimension I by R, with each element of the list corresponding to one of the num.iters sets of iterations run by Tune. To draw from the conditional posterior of the internal cell counts of a contigency table, the Tune function draws R-1 vectors of lenth C from multinomial distributions. In then calculates the counts in the additional row (denote this row as r') deterministically. This procedure can result in negative values in row r', in which case the overall proposal for the interior cell counts is outside the parameter space (and thus invalid). Each matrix of vld.NNs keeps track of the percentage of proposals drawn in this manner that are valid (i.e., not invalid). Each row of such a matrix corresponds to a contingency table. Each column in the matrix corresponds to a row in the a contingency table. Each entry specifies the percentage of multinomial proposals that are valid when the specified contingency table row serves as the r' row. For instance, in position 5,2 of vld.NNs is the fraction of valid proposals for the 5th contingency table when the second contigency table row is the r'th row. A value of ``NaN'' means that Tune chose to use a different (slower) method of drawing the internal cell counts because it suspected that the multinomial method would behave badly.

acc.NNs

A list of length num.runs: Same as vld.NNs, except the entries represent the fraction of proposals accepted (instead of the fraction that are in the permissible parameter space).

Details

TuneWithExitPoll is a necessary precursor function to AnalyzeWithExitPoll, the workhorse function in fitting the R x C ecological inference model described in Greiner & Quinn (2009) to ecological data augmented by data from surveys of some of the contingency tables. Details and terminology of the basic (i.e., without a survey sample) data structure and ecological inference model are discussed in the documentation accompanying the Analyze function. The purpose of TuneWithExitPoll, as prepatory to AnalyzeWithExitPoll, is the same as the purpose of Tune as prepatory to Analyze. See the documentation for Tune for a full explanation.

In the present implementation, the AnalyzeWithExitPoll, and thus TuneWithExitPoll, presume that the survey consisted of a simple random sample from the in-sample contingency tables. Future implementations will allow incorporation of more complicated survey sampling schemes.

The arguments to TuneWithExitPoll are essentially identical to those of Tune with the major exception of exitpoll. exitpoll feeds the results of the survey sample to the function, and a particular format is required. Specifically, exitpoll must have the same number of rows as data, meaning one row for each contigency table in the dataset. It must have R * C columns, meaning one column for each cell in one of the ecological data's contingency tables. The first row of exitpoll must correspond to the first row of data, meaning that the two rows must contain information from the same contingency table. The second row of exitpoll must contain information from the contingency table represented in the second row of data. And so on. Finally, exitpoll must have counts from the sample of the contingency table in vectorized row major format.

To illustrate with a voting example: Suppose the contingency tables have two rows, labeled bla and whi, and three columns, denoted Dem, Rep, and Abs. In other words, the fstring argument would be "Dem, Rep, Abs ~ bla, whi". Suppose there are 100 contingency tables. The data will be of dimension \(100 \times 5\), with each row consisting of the row and column totals from that particular contigency table. exitpoll will be of dimension \(100 \times 6\). Row 11 of the exitpoll will consist of the following: in position 1, the number of blacks voting Democrat observed in the sample of contingency table 11; in position 2, the number of blacks voting Republican observed in the sample of contigency table 11; in position 3, the number of blacks Abstaining from voting observed in the sample of contingency table 11; in position 4, the number of whites voting Democrat observed in the sample of contingency table 11; etc.

For tables in which there was no sample taken (i.e., out-of-sample tables), the corresponding row of exitpoll should have a vector of 0s.

References

D. James Greiner \& Kevin M. Quinn. 2009. ``R x C Ecological Inference: Bounds, Correlations, Flexibility, and Transparency of Assumptions.'' J.R. Statist. Soc. A 172:67-81.

Examples

Run this code
# NOT RUN {
SimData <- gendata.ep()    #  simulated data
FormulaString <- "Dem, Rep, Abs ~ bla, whi, his"
EPInvTune <-  TuneWithExitPoll(fstring = FormulaString,
                               data = SimData$GQdata,
                               exitpoll=SimData$EPInv$returnmat.ep,
                               num.iters = 10000,
                               num.runs = 15)
# }

Run the code above in your browser using DataLab