The program runs a Gibbs sampler for the Bayesian regression model described below.
Likelihood. The equation for the data is:
$$
\begin{array}{lr}
\boldsymbol y= \boldsymbol 1 \mu + \boldsymbol X_F \boldsymbol \beta_F + \boldsymbol X_R \boldsymbol \beta_R + \boldsymbol X_L \boldsymbol \beta_L + \boldsymbol{Zu} + \boldsymbol \varepsilon & (1)
\end{array}
$$
where \(\boldsymbol y\), the response is a \(n \times 1\) vector (NAs allowed); \(\mu\) is
an intercept; \(\boldsymbol X_F, \boldsymbol X_R, \boldsymbol X_L\) and \(\boldsymbol Z\) are incidence matrices
used to accommodate different
types of effects (see below), and; \(\boldsymbol \varepsilon\) is a vector of model residuals assumed to be
distributed as \(\boldsymbol \varepsilon \sim N(\boldsymbol 0,Diag(\sigma_{\boldsymbol \varepsilon}^2/w_i^2))\),
here \(\sigma_{\boldsymbol \varepsilon}^2\) is an (unknown)
variance parameter and \(w_i\) are (known) weights that allow for heterogeneous-residual variances.
Any of the elements in the right-hand side of the linear predictor, except \(\mu\) and \(\boldsymbol \varepsilon\) , can be omitted;
by default the program runs an intercept model.
Prior. The residual variance is assigned a scaled inverse-\(\chi^2\) prior with degree of freedom and scale parameter
provided by the user, that is, \(\sigma_{\boldsymbol \varepsilon}^2 \sim \chi^{-2} (\sigma_{\boldsymbol \varepsilon}^2 | df_{\boldsymbol \varepsilon}, S_{\boldsymbol \varepsilon})\). The regression coefficients \(\left\{\mu, \boldsymbol \beta_F, \boldsymbol \beta_R, \boldsymbol \beta_L, \boldsymbol u \right\}\) are assigned priors
that yield different type of shrinkage. The intercept and the vector of regression coefficients \(\boldsymbol \beta_F\) are assigned flat priors
(i.e., estimates are not shrunk). The vector of regression coefficients \(\boldsymbol \beta_R\) is assigned a
Gaussian prior with variance common to all effects, that is,
\(\beta_{R,j} \mathop \sim \limits^{iid} N(0, \sigma_{\boldsymbol \beta_R}^2)\). This prior is
the Bayesian counterpart of Ridge Regression. The variance parameter \(\sigma_{\boldsymbol \beta_R}^2\),
is treated as unknown and it is assigned a scaled inverse-\(\chi^2\) prior, that is,
\(\sigma_{\boldsymbol \beta_R}^2 \sim \chi^{-2} (\sigma_{\boldsymbol \beta_R}^2 | df_{\boldsymbol \beta_R}, S_{\boldsymbol \beta_R})\) with degrees of freedom
\(df_{\boldsymbol \beta_R}\), and scale \(S_{\boldsymbol \beta_R}\) provided by the user.
The vector of regression coefficients \(\boldsymbol \beta_L\) is treated as in
the Bayesian LASSO of Park and Casella (2008). Specifically,
$$p(\boldsymbol \beta_L, \boldsymbol \tau^2, \lambda | \sigma_{\boldsymbol \varepsilon}^2) = \left\{\prod_k N(\beta_{L,k} | 0, \sigma_{\boldsymbol \varepsilon}^2 \tau_k^2) Exp\left(\tau_k^2 | \lambda^2\right) \right\} p(\lambda),$$
where, \(Exp(\cdot|\cdot)\) is an exponential prior and \(p(\lambda)\) can either be: (a)
a mass-point at some value (i.e., fixed \(\lambda\)); (b) \(p(\lambda^2) \sim Gamma(r,\delta)\) this
is the prior suggested by Park and Casella (2008); or, (c) \(p(\lambda | \max, \alpha_1, \alpha_2) \propto Beta\left(\frac{\lambda}{\max} | \alpha_1,\alpha_2 \right)\), see de los Campos et al. (2009) for details. It can be shown that the marginal prior of regression coefficients \(\beta_{L,k}, \int N(\beta_{L,k} | 0, \sigma_{\boldsymbol \varepsilon}^2 \tau_k^2) Exp\left(\tau_k^2 | \lambda^2\right) \partial \tau_k^2\), is Double-Exponential. This prior has thicker tails and higher peak of mass at zero than the Gaussian prior used for \(\boldsymbol \beta_R\), inducing a different type of shrinkage.
The vector \(\boldsymbol u \) is used to model the so called `infinitesimal effects', and is assigned a prior \(\boldsymbol u \sim N(\boldsymbol 0, \boldsymbol A\sigma_{\boldsymbol u}^2)\),
where, \(\boldsymbol A\) is a positive-definite matrix (usually a relationship matrix computed from a pedigree) and \(\sigma_{\boldsymbol u}^2\) is an unknow variance, whose prior is
\(\sigma_{\boldsymbol u}^2 \sim \chi^{-2} (\sigma_{\boldsymbol u}^2 | df_{\boldsymbol u}, S_{\boldsymbol u})\).
Collecting the above mentioned assumptions, the posterior distribution of model unknowns,
\(\boldsymbol \theta= \left\{\mu, \boldsymbol \beta_F, \boldsymbol \beta_R, \sigma_{\boldsymbol \beta_R}^2, \boldsymbol \beta_L, \boldsymbol \tau^2, \lambda, \boldsymbol u, \sigma_{\boldsymbol u}^2, \sigma_{\boldsymbol \varepsilon}^2, \right\}\), is,
$$\begin{array}{lclr}
p(\boldsymbol \theta | \boldsymbol y) & \propto & N\left( \boldsymbol y | \boldsymbol 1 \mu + \boldsymbol X_F \boldsymbol \beta_F + \boldsymbol X_R \boldsymbol \beta_R + \boldsymbol X_L \boldsymbol \beta_L + \boldsymbol{Zu}; Diag\left\{ \frac{\sigma_\varepsilon^2}{w_i^2}\right\}\right) & \\
& & \times \left\{ \prod\limits_j N\left(\beta_{R,j} | 0, \sigma_{\boldsymbol \beta_R}^2 \right) \right\} \chi^{-2} \left(\sigma_{\boldsymbol \beta_R}^2 | df_{\boldsymbol \beta_R}, S_{\boldsymbol \beta_R}\right) & \\
& & \times \left\{ \prod\limits_k N \left( \beta_{L,k} |0,\sigma_{\boldsymbol \varepsilon}^2 \tau_k^2 \right) Exp \left(\tau_k^2 | \lambda^2\right)\right\} p(\lambda) & (2)\\
& & \times N(\boldsymbol u | \boldsymbol 0,\boldsymbol A\sigma_{\boldsymbol u}^2) \chi^{-2} (\sigma_{\boldsymbol u}^2 | df_{\boldsymbol u}, S_{\boldsymbol u}) \chi^{-2} (\sigma_{\boldsymbol \varepsilon}^2 | df_{\boldsymbol \varepsilon}, S_{\boldsymbol \varepsilon}) &
\end{array}
$$