Model and Priors
\(y_i\) \(\sim\) \(MNL(X_i, \beta_i)\) with \(i = 1, \ldots, length(lgtdata)\) and where \(\theta_i\) is \(nvar x 1\)
\(\beta_i = Z\Delta\)[i,] + \(u_i\)
Note: Z\(\Delta\) is the matrix \(Z * \Delta\); [i,] refers to \(i\)th row of this product
Delta is an \(nz x nvar\) matrix
\(\beta_i\) \(\sim\) \(N(\mu_i, \Sigma_i)\)
\(\theta_i = (\mu_i, \Sigma_i)\) \(\sim\) \(DP(G_0(\lambda), alpha)\)
\(G_0(\lambda):\)
\(\mu_i | \Sigma_i\) \(\sim\) \(N(0, \Sigma_i (x) a^{-1})\)
\(\Sigma_i\) \(\sim\) \(IW(nu, nu*v*I)\)
\(delta = vec(\Delta)\) \(\sim\) \(N(deltabar, A_d^{-1})\)
\(\lambda(a, nu, v):\)
\(a\) \(\sim\) uniform[alim[1], alimb[2]]
\(nu\) \(\sim\) dim(data)-1 + exp(z)
\(z\) \(\sim\) uniform[dim(data)-1+nulim[1], nulim[2]]
\(v\) \(\sim\) uniform[vlim[1], vlim[2]]
\(alpha\) \(\sim\) \((1-(alpha-alphamin) / (alphamax-alphamin))^{power}\)
alpha = alphamin then expected number of components = Istarmin
alpha = alphamax then expected number of components = Istarmax
\(Z\) should NOT include an intercept and is centered for ease of interpretation. The mean of each of the nlgt
\(\beta\)s is the mean of the normal mixture. Use summary()
to compute this mean from the compdraw
output.
We parameterize the prior on \(\Sigma_i\) such that \(mode(\Sigma) = nu/(nu+2) vI\). The support of nu enforces a non-degenerate IW density; \(nulim[1] > 0\).
The default choices of alim, nulim, and vlim determine the location and approximate size of candidate "atoms" or possible normal components. The defaults are sensible given a reasonable scaling of the X variables. You want to insure that alim is set for a wide enough range of values (remember a is a precision parameter) and the v is big enough to propose Sigma matrices wide enough to cover the data range.
A careful analyst should look at the posterior distribution of a, nu, v to make sure that the support is set correctly in alim, nulim, vlim. In other words, if we see the posterior bunched up at one end of these support ranges, we should widen the range and rerun.
If you want to force the procedure to use many small atoms, then set nulim to consider only large values and set vlim to consider only small scaling constants. Set alphamax to a large number. This will create a very "lumpy" density estimate somewhat like the classical Kernel density estimates. Of course, this is not advised if you have a prior belief that densities are relatively smooth.
Argument Details
Data = list(lgtdata, Z, p)
[Z
optional]
lgtdata: | list of lists with each cross-section unit MNL data |
lgtdata[[i]]$y: | \(n_i x 1\) vector of multinomial outcomes (1, ..., m) |
lgtdata[[i]]$X: | \(n_i x nvar\) design matrix for \(i\)th unit |
Z: | \(nreg x nz\) matrix of unit characteristics (def: vector of ones) |
p: | number of choice alternatives |
Prior = list(deltabar, Ad, Prioralpha, lambda_hyper)
[optional]
deltabar: | \(nz*nvar x 1\) vector of prior means (def: 0) |
Ad: | prior precision matrix for vec(D) (def: 0.01*I) |
Prioralpha: | list(Istarmin, Istarmax, power) |
$Istarmin: | expected number of components at lower bound of support of alpha def(1) |
$Istarmax: | expected number of components at upper bound of support of alpha (def: min(50, 0.1*nlgt)) |
$power: | power parameter for alpha prior (def: 0.8) |
lambda_hyper: | list(alim, nulim, vlim) |
$alim: | defines support of a distribution (def: c(0.01, 2) ) |
$nulim: | defines support of nu distribution (def: c(0.001, 3) ) |
$vlim: | defines support of v distribution (def: c(0.1, 4) ) |
Mcmc = list(R, keep, nprint, s, w, maxunique, gridsize)
[only R
required]
R: | number of MCMC draws |
keep: | MCMC thinning parameter -- keep every keep th draw (def: 1) |
nprint: | print the estimated time remaining for every nprint 'th draw (def: 100, set to 0 for no print) |
s: | scaling parameter for RW Metropolis (def: 2.93/sqrt(nvar) ) |
w: | fractional likelihood weighting parameter (def: 0.1) |
maxuniq: | storage constraint on the number of unique components (def: 200) |
gridsize: | number of discrete points for hyperparameter priors, (def: 20) |