Learn R Programming

BayesVarSel (version 1.6.2)

GibbsBvs: Bayesian Variable Selection for linear regression models using Gibbs sampling.

Description

Approximate computation of summaries of the posterior distribution using a Gibbs sampling algorithm to explore the model space and frequency of "visits" to construct the estimates.

Usage

GibbsBvs(formula, fixed.cov = c("Intercept"), data, prior.betas = "Robust", prior.models = "Constant", n.iter=1000, init.model = "Full", n.burnin = 50, n.thin=1, time.test = TRUE, priorprobs=NULL, seed = runif(1,0,16091956))

Arguments

formula
Formula defining the most complex regression model in the analysis. See details.
fixed.cov
A character vector with the names of the covariates that will be considered as fixed (no variable selection over these). This argument provides an implicit definition of the simplest model considered. Default is "Intercept". Use NULL if selection should be performed over all the variables defined by formula
data
data frame containing the data.
prior.betas
Prior distribution for regression parameters within each model. Possible choices include "Robust", "Liangetal", "gZellner", "ZellnerSiow" and "FLS" (see details).
prior.models
Prior distribution over the model space. Possible choices are "Constant", "ScottBerger" and "User" (see details).
n.iter
The total number of iterations performed after the burn in process.
init.model
The model at which the simulation process starts. Options include "Null" (the model only with the covariates specified in fixed.cov), "Full" (the model defined by formula), "Random" (a randomly selected model) and a vector with p (the number of covariates to select from) zeros and ones defining a model.
n.burnin
Length of burn in, i.e. number of iterations to discard at the beginning.
n.thin
Thinning rate. Must be a positive integer. Set 'n.thin' > 1 to save memory and computation time if 'n.iter' is large. Default is 1. This parameter jointly with n.iter sets the number of simulations kept and used to construct the estimates so is important to keep in mind that a large value for 'n.thin' can reduce the precision of the results.
time.test
If TRUE and the number of variables is large (>=21) a preliminary test to estimate computational time is performed.
priorprobs
A p+1 dimensional vector defining the prior probabilities Pr(M_i) (should be used in the case where prior.models="User"; see the details in Bvs.)
seed
A seed to initialize the random number generator

Value

returns an object of class Bvs with the following elements:
time
The internal time consumed in solving the problem
lmfull
The lm class object that results when the model defined by formula is fitted by lm
lmnull
The lm class object that results when the model defined by fixed.cov is fitted by lm
variables
The name of all the potential explanatory variables
n
Number of observations
p
Number of explanatory variables to select from
k
Number of fixed variables
HPMbin
The binary expression of the most probable model found.
inclprob
A data.frame with the estimates of the inclusion probabilities of all the variables.
jointinclprob
A data.frame with the estimates of the joint inclusion probabilities of all the variables.
postprobdim
Estimates of posterior probabilities of the dimension of the true model.
modelslogBF
A matrix with both the binary representation of the visited models after the burning period and the Bayes factor (log scale) of that model to the null model.
call
The call to the function.
method
gibbs

Details

This is a heuristic approximation to the function Bvs so the details there apply also here.

The algorithm implemented is a Gibbs sampling-based searching algorithm originally proposed by George and McCulloch (1997). Garcia-Donato and Martinez-Beneito (2013) have shown that this simple sampling strategy in combination with estimates based on frequency of visits (the one here implemented) provides very reliable results.

References

Garcia-Donato, G. and Martinez-Beneito, M.A. (2013) On sampling strategies in Bayesian variable selection problems with large model spaces. Journal of the American Statistical Association, 108: 340-352. George E. and McCulloch R. (1997) Approaches for Bayesian variable selection. Statistica Sinica, 7, 339:372.

See Also

plotBvs for a plot of the object created.

Examples

Run this code
## Not run: 
# #Analysis of Ozone35 data
# 
# data(Ozone35)
# 
# #We use here the (Zellner) g-prior for
# #regression parameters and constant prior
# #over the model space
# #In this Gibbs sampling scheme, we perform 10100 iterations,
# #of which the first 100 are discharged (burnin) and of the remaining
# #only one each 10 is kept.
# #as initial model we use the Full model
# Oz35.GibbsBvs<- GibbsBvs(formula="y~.", data=Ozone35, prior.betas="gZellner",
# prior.models="Constant", n.iter=10000, init.model="Full", n.burnin=100, 
# time.test = FALSE)
# 
# #Note: this is a heuristic approach and results are estimates
# #of the exact answer.
# 
# #with the print we can see which is the most probable model
# #among the visited
# Oz35.GibbsBvs
# 
# #The estimation of inclusion probabilities and
# #the model-averaged estimation of parameters:
# summary(Oz35.GibbsBvs)
# 
# #Plots:
# plotBvs(Oz35.GibbsBvs, option="conditional")
# ## End(Not run)

Run the code above in your browser using DataLab