Bayesian model selection for linear, asymmetric linear, median and quantile regression under non-local or Zellner priors. p>>n can be handled.
modelSelection enumerates all models when feasible
and uses a Gibbs scheme otherwise.
See coef
and coefByModel
for estimates and posterior
intervals of regression coefficients, and rnlp
for posterior samples.
modelsearchBlockDiag seeks the highest posterior probability model using an iterative block search.
modelSelection(y, x, data, smoothterms, nknots=9,
groups=1:ncol(x), constraints, center=TRUE, scale=TRUE,
enumerate, includevars=rep(FALSE,ncol(x)), models,
maxvars, niter=5000, thinning=1,
burnin=round(niter/10), family='normal', priorCoef,
priorGroup, priorDelta=modelbbprior(1,1),
priorConstraints,
priorVar=igprior(.01,.01),
priorSkew=momprior(tau=0.348), phi, deltaini=rep(FALSE,ncol(x)),
initSearch='greedy', method='auto', adj.overdisp='intercept',
hess='asymp', optimMethod, optim_maxit, initpar='none', B=10^5,
XtXprecomp= ifelse(ncol(x)<10^4,true,false), verbose="TRUE)modelsearchBlockDiag(y, x, priorCoef=momprior(tau=0.348),
priorDelta=modelbbprior(1,1), priorVar=igprior(0.01,0.01),
blocksize=10, maxiter=10, maxvars=100, maxlogmargdrop=20,
maxenum=10, verbose=TRUE)
10^4,true,false),>
Object of class msfit
, which extends a list with elements
matrix
with posterior samples for the model
indicator. postSample[i,j]==1
indicates that variable j was included in the model in the MCMC
iteration i
postOther
returns posterior samples for parameters other than the model
indicator, i.e. basically hyper-parameters.
If hyper-parameters were fixed in the model specification, postOther
will be empty.
Marginal posterior probability for inclusion of each
covariate. This is computed by averaging marginal post prob for
inclusion in each Gibbs iteration, which is much more accurate than
simply taking colMeans(postSample)
.
Model with highest posterior probability amongst all those visited
Unnormalized posterior prob of posterior mode (log scale)
Unnormalized posterior prob of each visited model (log scale)
List with priors specified when calling modelSelection
Either a formula with the regression equation or a vector with
observed responses. The response can be either continuous or of class
Surv
(survival outcome). If y
is a formula then x
,
groups
and constraints
are automatically created
Design matrix with linear covariates for which we want to
assess if they have a linear effect on the response. Ignored if
y
is a formula
If y
is a formula then data
should be a data
frame containing the variables in the model
Formula for non-linear covariates (cubic splines),
modelSelection assesses if the variable has no effect, linear or
non-linear effect. smoothterms
can also be a design matrix or
data.frame containing linear terms, for each column modelSelection
creates a spline basis and tests no/linear/non-linear effects
Number of spline knots. For cubic splines the non-linear
basis adds knots-4 coefficients for each linear term, we recommend
setting nknots
to a small/moderate value
If variables in x
such be added/dropped in groups,
groups
indicates the group that each variable corresponds to
(by default each variable goes in a separate group)
Constraints on the model space. List with length equal to the number of groups; if group[[i]]=c(j,k) then group i can only be in the model if groups j and k are also in the model
If TRUE
, y
and x
are centered to have
zero mean. Dummy variables corresponding to factors are NOT centered
If TRUE
, y
and columns in x
are
scaled to have variance=1. Dummy variables corresponding to factors are NOT scaled
Default is TRUE
if there's less than 15 variable
groups. If TRUE
all models with up to maxvars
are
enumerated, else Gibbs sampling is used to explore the model space
Logical vector of length ncol(x) indicating variables that should always be included in the model, i.e. variable selection is not performed for these variables
Optional logical matrix indicating the models to be
enumerated with rows equal to the number of desired models and columns
to the number of variables in x
.
When enumerate==TRUE
only models with up to maxvars variables
enumerated (defaults to all variables). In modelsearchBlockDiag
a sequence of models
is defined from 1 up to maxvars
Number of Gibbs sampling iterations
MCMC thinning factor, i.e. only one out of each thinning
iterations are reported. Defaults to thinning=1, i.e. no thinning
Number of burn-in MCMC iterations. Defaults to
.1*niter
. Set to 0 for no burn-in
Family of parametric distribution. Use
'normal' for Normal errors, 'binomial' for logistic regression,
'poisson' for Poisson regression.
'twopiecenormal' for two-piece Normal,
'laplace' for Laplace errors and 'twopiecelaplace' for double
exponential.
For 'auto' the errors are assumed continuous and their distribution
is inferred from the data among
'normal', 'laplace', 'twopiecenormal' and 'twopiecelaplace'.
'laplace' corresponds to median regression and 'twopiecelaplace'
to quantile regression. See argument priorSkew
Prior on coefficients, created
by momprior
, imomprior
, emomprior
or
zellnerprior
.
Prior dispersion is on coefficients/sqrt(scale) for Normal and
two-piece Normal, and on coefficients/sqrt(2*scale) for Laplace
and two-piece Laplace.
Prior on grouped coefficients (e.g. categorical
predictors with >2 categories, splines). Created by
groupmomprior
, groupemomprior
,
groupimomprior
or groupzellnerprior
Prior on model space. Use modelbbprior()
for Beta-Binomial prior, modelbinomprior(p)
for Binomial
prior with prior inclusion probability p
,
modelcomplexprior
for Complexity prior,
or modelunifprior()
for Uniform prior
Prior distribution on the number of terms subject to hierarchical constrains that are included in the model
Inverse gamma prior on scale parameter. For Normal outcomes variance=scale, for Laplace outcomes variance=2*scale
Either a fixed value for tanh(alpha) where alpha is
the asymmetry parameter or a prior on tanh(alpha).
For family=='twopiecelaplace'
setting alpha=a is equivalent
to performing quantile regression for the quantile (1+a)/2.
Ignored if family
is 'normal' or 'laplace'.
The error variance in Gaussian models, typically this is unknown and is left missing
Logical vector of length ncol(x)
indicating which
coefficients should be initialized to be non-zero.
Defaults to all variables being excluded from the model
Algorithm to refine
deltaini
. initSearch=='greedy'
uses a greedy Gibbs
sampling search. initSearch=='SCAD'
sets deltaini
to the
non-zero elements in a SCAD fit with cross-validated regularization
parameter. initSearch=='none'
leaves deltaini
unmodified
Method to compute marginal likelihood.
method=='Laplace'
for Laplace approx, method=='ALA'
for approximate Laplace approximation.
method=='MC'
for Importance Sampling, method=='Hybrid'
for Hybrid Laplace-IS (only available for piMOM prior). See Details.
method=='auto'
attempts to use exact calculations when
possible, otherwise ALA if available, otherwise Laplace approx.
Only used when method=='ALA'. Over-dispersion adjustment in models with fixed dispersion parameter, as in logistic and Poisson regression. adj.overdisp='none' for no adjustment (not recommended, particularly for Poisson models). adj.overdisp='intercept' to estimate over-dispersion from the intercept-only model, and adj.overdisp='residuals' from the Pearson residuals of each model
Method to estimat the hessian in the Laplace approximation to the integrated likelihood under Laplace or asymmetric Laplace errors. When hess=='asymp' the asymptotic hessian is used, hess=='asympDiagAdj' a diagonal adjustment is applied (see Rossell and Rubio for details).
Algorithm to maximize objective function when method=='Laplace'. Leave unspecified or set optimMethod=='auto' for an automatic choice. optimMethod=='LMA' uses modified Newton-Raphson algorithm, 'CDA' coordinate descent algorithm
Maximum number of iterations when method=='Laplace'
Initial regression parameter values when finding the posterior mode to approximate the integrated likelihood. 'none', 'MLE', 'L1', or a numeric vector with initial values. 'auto': if p<n/2 MLE is used, else L1 (regularization parameter set via BIC)
Number of samples to use in Importance Sampling scheme. Ignored
if method=='Laplace'
Set to TRUE
to pre-compute the Gram matrix x'x
upfront (saves time), to FALSE
to compute and store elements
only as needed (saves memory)
Set verbose==TRUE
to print iteration progress
Maximum number of variables in a block. Careful, the
cost of the algorithm is of order 2^blocksize
Maximum number of iterations, each iteration includes a screening pass to add and subtract variables
Stop the sequence of models when the drop in log
p(y|model) is greater than maxlogmargdrop
. This option avoids
spending unnecessary time exploring overly large models
If the posterior mode found has less than maxenum
variables then do a full enumeration of all its submodels
David Rossell
Let delta be the vector indicating inclusion/exclusion of each column of x in the model. The Gibbs algorithm sequentially samples from the posterior of each element in delta conditional on all the remaining elements in delta and the data. To do this it is necessary to evaluate the marginal likelihood for any given model. These have closed-form expression for the MOM prior, but for models with >15 variables these are expensive to compute and Laplace approximations are used instead (for the residual variance a log change of variables is used, which improves the approximation). For other priors closed forms are not available, so by default Laplace approximations are used. For the iMOM prior we also implement a Hybrid Laplace-IS which uses a Laplace approximation to evaluate the integral wrt beta and integrates wrt phi (residual variance) numerically.
It should be noted that Laplace approximations tend to under-estimate the marginal densities when the MLE for some parameter is very close to 0. That is, it tends to be conservative in the sense of excluding more variables from the model than an exact calculation would.
Finally, method=='plugin' provides a BIC-type approximation that is faster than exact or Laplace methods, at the expense of some accuracy. In non-sparse situations where models with many variables have large posterior probability method=='plugin' can be substantially faster.
For more details on the methods used to compute marginal densities see Johnson & Rossell (2012).
modelsearchBlockDiag
uses the block search method described in
Papaspiliopoulos & Rossell. Briefly, spectral clustering is run on
X'X to cluster variables into blocks of blocksize
and
subsequently the Coolblock algorithm is used to define a sequence
of models of increasing size. The exact integrated likelihood
is evaluated for all models in this path, the best model chosen,
and the scheme iteratively repeated to add and drop variables
until convergence.
Johnson V.E., Rossell D. Non-Local Prior Densities for Default Bayesian Hypothesis Tests. Journal of the Royal Statistical Society B, 2010, 72, 143-170.
Johnson V.E., Rossell D. Bayesian model selection in high-dimensional settings. Journal of the American Statistical Association, 2012, 107, 649-660.
Papaspiliopoulos O., Rossell, D. Scalable Bayesian variable selection and model averaging under block orthogonal design. 2016
Rossell D., Rubio F.J. Tractable Bayesian variable selection: beyond normality. 2016
msfit-class
for details on the output.
postProb
to obtain posterior model probabilities.
coef.msfit
for Bayesian model averaging estimates and
intervals. predict.msfit
for BMA estimates and intervals
for user-supplied covariate values.
plot.msfit
for an MCMC diagnostic plot showing estimated
marginal posterior inclusion probabilities vs. iteration number.
rnlp
to obtain posterior samples for the coefficients.
nlpMarginal
to compute marginal densities for a given model.
#Simulate data
x <- matrix(rnorm(100*3),nrow=100,ncol=3)
theta <- matrix(c(1,1,0),ncol=1)
y <- x %*% theta + rnorm(100)
#Specify prior parameters
priorCoef <- momprior(tau=0.348)
priorDelta <- modelunifprior()
#Alternative model space prior: 0.5 prior prob for including any covariate
priorDelta <- modelbinomprior(p=0.5)
#Alternative: Beta-Binomial prior for model space
priorDelta <- modelbbprior(alpha.p=1,beta.p=1)
#Model selection
fit1 <- modelSelection(y=y, x=x, center=FALSE, scale=FALSE,
priorCoef=priorCoef, priorDelta=priorDelta)
postProb(fit1) #posterior model probabilities
fit1$margpp #posterior marginal inclusion prob
coef(fit1) #BMA estimates, 95% intervals, marginal post prob
Run the code above in your browser using DataLab