Tune: Tuning Function for Ecological Inference for Sets of R x C Contingency Tables

Description

This function tunes the markov chain monte carlo algorithm used to fit a hierarchical model to ecological data in which the underlying contigency tables can have any number of rows or columns. The user supplies the data and may specify hyperprior values. The function's primary output is a vector of multipliers, called rhos, used to adjust the covariance matrix of the multivariate \(t_4\) distribution used to propose new values of intermediate-level parameters (denoted THETAS).

Usage

Tune(fstring, data=NULL, num.runs=12, num.iters=10000,
     rho.vec=rep(0.05, ntables),
     kappa=10, nu=(mu.dim+6), psi=mu.dim,
     mu.vec.0=rep(log((.45/(mu.dim-1))/.55), mu.dim),
     mu.vec.cu=runif(mu.dim, -3, 0),
     nolocalmode=50, sr.probs=NULL, sr.reps=NULL, 
     numscans=1, Diri=100, dof=4, debug=1)

Arguments

fstring

String: model formula of contingency tables' column totals versus row totals. Must be in specified format (an R character string and NOT a true R formula). See Details and Examples.

data

Data frame.

num.runs

Positive integer: The number of runs or times (each of num.iters iterations) the tuning algorthm will be implemented.

num.iters

Positive integer: The number of iterations in each run of the tuning algorithm.

rho.vec

Vector of dimension \(I\) = number of contigency tables = number of rows in data: initial values of multipliers (usually in (0,1)) to the covariance matrix of the proposal distribution for the draws of the intermediate level parameters. The purpose of this Tune function is to adjust these values so as to achieve acceptance ratios of between .2 and .5 in the MCMC draws of the THETAs.

kappa

Scalar: The diagonal of the covariance matrix for the (normal) hyperprior distribution for the \(\mu\) parameter.

Scalar: The degrees of freedom for the (Inverse-Wishart) hyperprior distriution for the SIGMA parameter.

psi

Scalar: The diagonal of the matrix parameter of the (Inverse-Wishart) hyperprior distribution for the SIGMA parameter.

mu.vec.0

Vector: mean of the (normal) hyperprior distribution for the \(\mu\) parameter.

mu.vec.cu

Vector of dimension \(R*(C-1)\), where \(R\)(\(C\)) is the number of rows(columns) in each contigency table: Optional starting values for \(\mu\) parameter.

nolocalmode

Positive integer: How often an alternative drawing method for the contigency table internal cell counts will be used. Use of default value recommended.

sr.probs

Matrix of dimension \(I\) x \(R\): Each value represents the probability of selecting a particular contingency table's row as the row to be calculated deterministically in (product multinomial) proposals for Metropolis draws of the internal cell counts. For example, if R = 3 and row 2 of position sr.probs = c(.1, .5, .4), then in the third contingency table (correspoding to the third row of data), the proposal algorithm for the interior cell counts will calculate the third contingency table's first row deterministically with probability .1, the second row with probability .5, and the third row with probability .4. Use of default (generated internally) recommended.

sr.reps

Matrix of dimension \(I\) x \(R\): Each value represents the number of times the (product multinomial proposal) Metropolis algorithm will be attempted when, in drawing the internal cell counts, the proposal for the corresponding contingency table row is to be calculated deterministically. sr.reps has the same structure as sr.probs, i.e., position [3,1] of sr.reps corresponds to the third contingency table's first row. Use of default (generated internally) recommended.

numscans

Positive integer: How often the algorithm to draw the contingency table internal cell counts will be implemented before new values of the other parameters are drawn. Use of default value recommended.

Diri

Positive integer: How often a product Dirichlet proposal distribution will be used to draw the contingency table row probability vectors (the THETAS).

dof

Positive integer: The degrees of freedom of the multivariate \(t\) proposal distribution used in drawing the contingency table row probability vectors (the THETAS).

debug

Integer: Akin to verbose in some packages. If set to 1, certain status information (including rough notification regarding the number of iterations completed) will be written to the screen.

Value

A list with the following elements.

rhos

A vector of length I = number of contingency tables: each element of the rhos vector is a multiplier used in the proposal distribution of for draws from the conditional posterior of the THETAs, as described above. Feed this vector into the Analyze function.

acc.t

Matrix of dimension I x num.runs: Each column of acc.t contains the acceptance fractions for the Metropolis-Hastings algorithm, with a multivariate \(t_4\) proposal distribution, used to draw from the conditional posterior of the THETAs. If Tune has worked properly, all elements of the final column of this matrix should be between .2 and .5.

acc.Diri

Matrix of dimension I x num.runs: Each column of acc.t contains the acceptance fractions for the Metropolis-Hastings algorithm, with independent Dirichlet proposals, used to draw from the conditional posterior of the THETAs. Tune does not alter this algorithm.

vld.NNs

A list of length num.runs: Each element of vld.NNs is a matrix of dimension I by R, with each element of the list corresponding to one of the num.iters sets of iterations run by Tune. To draw from the conditional posterior of the internal cell counts of a contigency table, the Tune function draws R-1 vectors of lenth C from multinomial distributions. In then calculates the counts in the additional row (denote this row as r') deterministically. This procedure can result in negative values in row r', in which case the overall proposal for the interior cell counts is outside the parameter space (and thus invalid). Each matrix of vld.NNs keeps track of the percentage of proposals drawn in this manner that are valid (i.e., not invalid). Each row of such a matrix corresponds to a contingency table. Each column in the matrix corresponds to a row in the a contingency table. Each entry specifies the percentage of multinomial proposals that are valid when the specified contingency table row serves as the r' row. For instance, in position 5,2 of vld.NNs is the fraction of valid proposals for the 5th contingency table when the second contigency table row is the r'th row. A value of ``NaN'' means that Tune chose to use a different (slower) method of drawing the internal cell counts because it suspected that the multinomial method would behave badly.

acc.NNs

A list of length num.runs: Same as vld.NNs, except the entries represent the fraction of proposals accepted (instead of the fraction that are in the permissible parameter space).

Details

Tune is a necessary precursor function to Analyze, the workhorse function in fitting the R x C ecological inference model described in Greiner & Quinn (2009). The details of this model are discussed in the documentation accompanying Analyze.

One of the stages of the Gibbs sampler used to fit the Greiner & Quinn ecological inference model involves sampling from the conditional posterior distribution of the vector of probabilities associated with each contingency table (precinct, in voting applications). There are \(R\) separate sets of probabilities (each of which must sum to one) associated with each contingency table. Each such \(theta_r\) undergoes a multidimensional logistic transformation, using the last (right-most) column as the reference category. This results in \(R\) transformed vectors of dimension \((C-1)\); the transformed vectors, denoted \(\omega_rs\), are stacked to form a single \(\omega\) vector corresponding to that contingency table. The omega vectors are assumed to follow (i.i.d.) a multivariate normal distribution.

The posterior distribution of the THETAs/OMEGAs are in non-standard form. To sample from the posterior, the algorithm uses a Metropolis-Hastings step with a multivariate \(t_4\) proposal distribution. The covariance matrix of this multivariate \(t_4\) must be expanded or shrunk to achieve acceptance ratios of between .2 and .5. Tune implements num.runs sets of num.iters iterations of the Gibbs sampler. At the end of each set of iterations, Tune examines the acceptance ratios in each precinct and adjusts a shrinkage factor (a scalar multiplied to the covariance matrix of the \(t_4\) proposal) upwards or downwards. When finished, Tune returns a vector of length I = the number of contingency tables in data, This vector, called rhos, should be fed into the Analyze function. See Examples here and accompanying Analze.

References

D. James Greiner \& Kevin M. Quinn. 2009. ``R x C Ecological Inference: Bounds, Correlations, Flexibility, and Transparency of Assumptions.'' J.R. Statist. Soc. A 172:67-81.

Examples

Run this code

# NOT RUN {
library(RxCEcolInf)
data(stlouis)
Tune.stlouis <- Tune("Bosley, Roberts, Ribaudo, Villa, NoVote ~ bvap, ovap",
                     data = stlouis,
                     num.iters = 10000,
                     num.runs = 15)
# }

Run the code above in your browser using DataLab