This function sets up a spike-and-slab model for variable selection and model
choice in generalized additive models and samples its posterior. It uses a
blockwise Metropolis-within-Gibbs sampler and the redundant multiplicative
parameter expansion described in the reference. This routine is not meant to
be called directly by the user -- spikeSlabGAM
provides a
formula-based interface for specifying models and takes care of (most of) the
housekeeping. Sampling of the chains is done in parallel using package
parallel
. A "SOCK" cluster is set up under Windows to do so (and
closed after computations are done, I try to clean up after myself), see
makeCluster
etc. Use options(mc.cores =<foo>)
to set the (maximal) number of processes forked by the parallelization. If
options()$mc.cores
is unspecified, it is set to 2.
spikeAndSlab(
y,
X,
family = c("gaussian", "binomial", "poisson"),
hyperparameters = list(),
model = list(),
mcmc = list(),
start = list()
)
response
design matrix
(character) the family of the response, defaults to normal/Gaussian response
a list of hyperparameters controlling the priors (see details)
a list with information about the grouping structure of the model (see details)
(optional) list setting arguments for the sampler (see details)
(optional) list containing the starting values for \(\beta, \gamma, \tau^2, \sigma^2, w\) and, optionally, the random seed
a list with components:
formula
see arguments
data
see arguments
family
see arguments
y
see arguments
X
see arguments
hyperparameters
see arguments
model
see arguments
mcmc
see arguments
start
see arguments
posteriorPred
a list with entries mu
and
y
containing samples of the expected values and realizations of the
response from the posterior predictive
postMeans
a list containing the posterior means of the parameters:
beta
the regression coefficients
alpha
ksi
tau
hypervariances of the penalized model terms
gamma
inclusion indicator variables of the model terms
pV1
\(P(\gamma = 1)\)
w
hyperparameter
for gamma
sigma2
error variance (for Gaussian data)
logLik
log likelihood
logPost
log of (unnormalized) posterior
samples
a list containing the posterior samples of the parameters, see above for explanation of the entries
DIC
a vector with \(DIC, pD, \bar{D},\hat{D}\). Usually doesn't make much sense for this kind of model because of the posterior's multimodality.
fitted
a matrix with the posterior mean of the linear predictor in the first column and the posterior mean of the expected response in the second.
runTime
of the sampler, in seconds
Details for model specification:
hyperparameters
w
hyperparameters for the \(Beta\)-prior for \(w\);
defaults to c(alphaW = 1, betaW = 1)
, i.e. a uniform distribution.
tau2
hyperparameters for the \(\Gamma^{-1}\)-prior of the
hypervariances \(\tau^2\); defaults to c(a1 = 5, a2 = 25)
gamma
sets \(v_0\), the ratio between the spike and slab
variances, defaults to c(v0 = 0.00025)
sigma2
hyperparameters for \(\Gamma^{-1}\)-prior for error
variance; defaults to c(b1 = 1e-4, b2 = 1e-4)
. Only relevant for Gaussian
response.
varKsi
variance for prior of \(\xi\), defaults to 1
ksiDF
defaults to 0 for a gaussian prior for \(\xi\), else induces a t-prior for \(\xi\)
model
groupIndicators
a factor that maps the columns of X to the different model terms
H
a matrix containing the hierarchy of the penalized model terms
n
number of observations
q
length of \(\beta\)
scale
scale/weights of
the response, defaults to rep(1, n)
, use this to specify number of
trials for binomial data
offset
defaults to rep(0,
n)
mcmc
nChains
how many parallel chains to run: defaults to 3
chainLength
how many samples should be generated per chain, defaults to 500
burnin
how many initial iterations should be discarded, defaults to 100
thin
save only every thin
-th
iteration, defaults to 5
verbose
verbose output and report progress? defaults to TRUE
returnSamples
defaults to TRUE
sampleY
generate samples of y and its conditional expectation from posterior predictive? defaults to FALSE
useRandomStart
use random draw or ridge estimate for beta as starting value? defaults to TRUE, i.e. random starting values.
blocksize
approx. blocksizes of the updates for \(\alpha, \xi\). Defaults to 50 for gaussian responses and 5/15 for non-gaussian responses.
scalemode
how to do term-wise rescaling of subvectors of \(\xi\) in each iteration: 0 means no rescaling, 1 means rescaling s.t. each mean\((|\xi_g|) = 1\), 2 means rescaling s.t. each max\((|\xi_g|) = 1\)
modeSwitching
probability to do P-IWLS with the mode of the proposal set to the current value, which is useful if the chain gets stuck. Defaults to \(0.05\). Increase this if acceptance rates are too low.
reduceRet
don't return data and samples for \(\alpha, \xi, \tau^2\)? defaults to FALSE
start
gamma
starting values for \(\gamma\). Defaults to a vector of
1's if mcmc$useRandomStart
is FALSE
, otherwise drawn from the
prior.
tau2
starting values for \(\tau^2\). Defaults to the
mode of the prior if mcmc$useRandomStart
is FALSE
, otherwise
drawn from the prior.
sigma2
starting values for
\(\sigma^2\). Only relevant for Gaussian response. Defaults to the variance
of the response divided by the number of covariates if
mcmc$useRandomStart
is FALSE
, otherwise drawn from the prior.
w
starting value for \(w\). Defaults to the mean of the prior
if mcmc$useRandomStart
is FALSE
, otherwise drawn from the
prior.
seed
Sets RNG seed for reproducible results. Parallel chains are seeded with this seed incremented by the number of the chain.
Scheipl, F. (2010) Normal-Mixture-of-Inverse-Gamma Priors for Bayesian Regularization and Model Selection in Structured Additive Regression Models. LMU Munich, Department of Statistics: Technical Reports, No.84 (https://epub.ub.uni-muenchen.de/11785/)