SSVSquantreg: Stochastic search variable selection for quantile regression

Description

This function uses stochastic search to select promising regression models at a fixed quantile $\tau$. Indicator variables $\gamma$ are used to represent whether a predictor is included in the model or not. The user supplies the data and the prior distribution on the model size. A list is returned containing the posterior sample of $\gamma$ and the associated regression parameters $\beta$.

Usage

SSVSquantreg(
  formula,
  data = NULL,
  tau = 0.5,
  include = NULL,
  burnin = 1000,
  mcmc = 10000,
  thin = 1,
  verbose = 0,
  seed = sample(1:1e+06, 1),
  pi0a0 = 1,
  pi0b0 = 1,
  ...
)

Value

A list containing:

gamma: The posterior sample of $\gamma$. This has associated summary and plot methods.
beta: The posterior sample of the associated regression parameters $\beta$. This can be analysed with functions from the coda package.

Arguments

formula: Model formula.
data: Data frame.
tau: The quantile of interest. Must be between 0 and 1. The default value of 0.5 corresponds to median regression model selection.
include: The predictor(s) that should definitely appear in the model. Can be specified by name, or their position in the formula (taking into account the intercept).
burnin: The number of burn-in iterations for the sampler.
mcmc: The number of MCMC iterations after burnin.
thin: The thinning interval used in the simulation. The number of MCMC iterations must be divisible by this value.
verbose: A switch which determines whether or not the progress of the sampler is printed to the screen. If verbose is greater than 0 the iteration number, the most recently sampled values of $\gamma$ and the associated values of $\beta$ are printed to the screen every verboseth iteration.
seed: The seed for the random number generator. If NA, the Mersenne Twister generator is used with default seed 12345; if an integer is passed it is used to seed the Mersenne twister. The default value for this argument is a random integer between 1 and 1,000,000. This default value ensures that if the function is used again with a different value of $\tau$, it is extremely unlikely that the seed will be identical. The user can also pass a list of length two to use the L'Ecuyer random number generator, which is suitable for parallel computation. The first element of the list is the L'Ecuyer seed, which is a vector of length six or NA (if NA a default seed of rep(12345,6) is used). The second element of list is a positive substream number. See the MCMCpack specification for more details.
pi0a0, pi0b0: Hyperparameters of the beta prior on $\pi_0$, the prior probability of including a predictor. Default values of (1,1) are equivalent to a uniform distribution.
...: Further arguments

Author

Craig Reed

Details

SSVSquantreg implements stochastic search variable selection over a set of potential predictors to obtain promising models. The models considered take the following form:

$$Q_{\tau}(y_i|x_{i\gamma}) = x_{i\gamma} ' \beta_{\gamma},$$

where $Q_{\tau}(y_i|x_{i\gamma})$ denotes the conditional $\tau$th quantile of $y_i$ given $x_{i\gamma}$, $x_{i\gamma}$ denotes $x_i$ with those predictors $x_{ij}$ for which $\gamma_j=0$ removed and $\beta_{\gamma}$ denotes the model specific regression parameters.

The likelihood is formed based on the assumption of independent asymmetric Laplace distributions on the $y_i$ with skewness parameter $\tau$ and location parameters $ x_{i\gamma} ' \beta_{\gamma}$. This assumption ensures that the likelihood function is maximised by the $\tau$th conditional quantile of the response variable.

The prior on each $\beta_j$ is

$$(1-\gamma_j)\delta_0+\gamma_j\mbox{Cauchy}(0,1),$$

where $\delta_0$ denotes a degenerate distribution with all mass at 0. A standard Cauchy distribution is chosen conditional on $\gamma_j=1$. This allows for a wider range of nonzero values of $\beta_j$ than a standard Normal distribution, improving the robustness of the method. Each of the indicator variables $\gamma_j$ is independently assigned a Bernoulli prior, with prior probability of inclusion $\pi_0$. This in turn is assigned a beta distribution, resulting in a beta-binomial prior on the model size. The user can supply the hyperparameters for the beta distribution. Starting values are randomly generated from the prior distribution.

It is recommended to standardise any non-binary predictors in order to compare these predictors on the same scale. This can be achieved using the scale function.

If it is certain that a predictor should be included, all predictors specified are brought to the first positions for computational convenience. The regression parameters associated with these predictors are given independent improper priors. Users may notice a small speed advantage if they specify the predictors that they feel certain should appear in the model, particularly for large models with a large number of observations.

References

Craig Reed, David B. Dunson and Keming Yu. 2010. "Bayesian Variable Selection for Quantile Regression" Technical Report.

Daniel Pemstein, Kevin M. Quinn, and Andrew D. Martin. 2007. Scythe Statistical Library 1.2. http://scythe.wustl.edu.s3-website-us-east-1.amazonaws.com/.

Keming Yu and Jin Zhang. 2005. "A Three Parameter Asymmetric Laplace Distribution and it's extensions." Communications in Statistics - Theory and Methods, 34, 1867-1879.

Martyn Plummer, Nicky Best, Kate Cowles, and Karen Vines. 2006. ``Output Analysis and Diagnostics for MCMC (CODA)'', R News. 6(1): 7-11. https://CRAN.R-project.org/doc/Rnews/Rnews_2006-1.pdf.

Examples

Run this code


if (FALSE) {

set.seed(1)
epsilon<-rnorm(100)
set.seed(2)
x<-matrix(rnorm(1000),100,10)
y<-x[,1]+x[,10]+epsilon
qrssvs<-SSVSquantreg(y~x)
model.50pc<-SSVSquantreg(y~x)
model.90pc<-SSVSquantreg(y~x,tau=0.9)
summary(model.50pc) ## Intercept not in median probability model
summary(model.90pc) ## Intercept appears in median probability model
}

Run the code above in your browser using DataLab