- formula
generalized linear model formula for the full model with all
predictors, Y ~ X. All code assumes that an intercept will be included in
each model.
- family
a description of the error distribution and link function for
exponential family; currently only `binomial()` with the logistic link and
`poisson()` and `Gamma()`with the log link are available.
- data
data frame
- weights
optional vector of weights to be used in the fitting process.
May be missing in which case weights are 1.
- subset
subset of data used in fitting
- contrasts
an optional list. See the contrasts.arg of `model.matrix.default()`.
- offset
a priori known component to be included in the linear
predictor; by default 0.
- na.action
a function which indicates what should happen when the data
contain NAs. The default is "na.omit".
- n.models
number of unique models to keep. If NULL, BAS will attempt
to enumerate unless p > 35 or method="MCMC". For any of methods using MCMC
algorithms that sample with replacement, sampling will stop when the number
of iterations exceeds the min of 'n.models' or 'MCMC.iterations' and on exit
'n.models' is updated to reflect the unique number of models that have been
sampled.
- betaprior
Prior on coefficients for model coefficients (except
intercept). Options include
g.prior
,
CCH
,
robust
,
intrinsic
,
beta.prime
,
EB.local
,
AIC
, and
BIC
.
- modelprior
Family of prior distribution on the models. Choices
include uniform
, Bernoulli
,
beta.binomial
, truncated Beta-Binomial,
tr.beta.binomial
, and truncated power family
tr.power.prior
.
- initprobs
vector of length p with the initial inclusion probabilities
used for sampling without replacement (the intercept will be included with
probability one and does not need to be added here) or a character string
giving the method used to construct the sampling probabilities if "Uniform"
each predictor variable is equally likely to be sampled (equivalent to
random sampling without replacement). If "eplogp", use the
eplogprob
function to approximate the Bayes factor using
p-values to find initial marginal inclusion probabilities and sample
without replacement using these inclusion probabilities, which may be
updated using estimates of the marginal inclusion probabilities. "eplogp"
assumes that MLEs from the full model exist; for problems where that is not
the case or 'p' is large, initial sampling probabilities may be obtained
using eplogprob.marg
which fits a model to each predictor
separately. To run a Markov Chain to provide initial
estimates of marginal inclusion probabilities, use method="MCMC+BAS" below.
While the initprobs are not used in sampling for method="MCMC", this
determines the order of the variables in the lookup table and affects memory
allocation in large problems where enumeration is not feasible. For
variables that should always be included set the corresponding initprobs to
1, to override the `modelprior` or use `include.always` to force these variables
to always be included in the model.
- include.always
A formula with terms that should always be included
in the model with probability one. By default this is `~ 1` meaning that the
intercept is always included.
This will also override any of the values in `initprobs`
above by setting them to 1.
- method
A character variable indicating which sampling method to use:
method="BAS" uses Bayesian Adaptive Sampling (without replacement) using the
sampling probabilities given in initprobs and updates using the marginal
inclusion probabilities to direct the search/sample; method="MCMC" combines
a random walk Metropolis Hastings (as in MC3 of Raftery et al 1997) with a
random swap of a variable included with a variable that is currently
excluded (see Clyde, Ghosh, and Littman (2010) for details);
method="MCMC+BAS" runs an initial MCMC as above to calculate marginal
inclusion probabilities and then samples without replacement as in BAS;
method = "deterministic" runs an deterministic sampling using the initial
probabilities (no updating); this is recommended for fast enumeration or if a
model of independence is a good approximation to the joint posterior
distribution of the model indicators. For BAS, the sampling probabilities
can be updated as more models are sampled. (see 'update' below). We
recommend "MCMC+BAS" or "MCMC" for high dimensional problems.
- update
number of iterations between potential updates of the sampling
probabilities in the "BAS" method. If NULL do not update, otherwise the
algorithm will update using the marginal inclusion probabilities as they
change while sampling takes place. For large model spaces, updating is
recommended. If the model space will be enumerated, leave at the default.
- bestmodel
optional binary vector representing a model to initialize
the sampling. If NULL sampling starts with the null model
- prob.rw
For any of the MCMC methods, probability of using the
random-walk proposal; otherwise use a random "flip" move to propose a new
model.
- MCMC.iterations
Number of models to sample when using any of the MCMC
options; should be greater than 'n.models'. By default 10*n.models.
- thin
oFr "MCMC", thin the MCMC chain every "thin" iterations; default
is no
thinning. For large p, thinning can be used to significantly reduce memory
requirements as models and associated summaries are saved only every thin
iterations. For thin = p, the model and associated output are recorded
every p iterations,similar to the Gibbs sampler in SSVS.
- control
a list of parameters that control convergence in the fitting
process. See the documentation for glm.control()
- laplace
logical variable for whether to use a Laplace approximate for
integration with respect to g to obtain the marginal likelihood. If FALSE
the Cephes library is used which may be inaccurate for large n or large
values of the Wald Chisquared statistic.
- renormalize
logical variable for whether posterior probabilities
should be based on renormalizing marginal likelihoods times prior
probabilities or use Monte Carlo frequencies. Applies only to MCMC sampling.
- force.heredity
Logical variable to force all levels of a factor to be
included together and to include higher order interactions only if lower
order terms are included. Currently only supported with `method='MCMC'`
and `method='BAS'` (experimental) on non-Solaris platforms.
Default is FALSE.
- bigmem
Logical variable to indicate that there is access to
large amounts of memory (physical or virtual) for enumeration
with large model spaces, e.g. > 2^25.