Various parameters that control fitting of regression models
using bayesx
.
bayesx.control(model.name = "bayesx.estim",
family = "gaussian", method = "MCMC", verbose = FALSE,
dir.rm = TRUE, outfile = NULL, replace = FALSE, iterations = 12000L,
burnin = 2000L, maxint = NULL, step = 10L, predict = TRUE,
seed = NULL, hyp.prior = NULL, distopt = NULL, reference = NULL,
zipdistopt = NULL, begin = NULL, level = NULL, eps = 1e-05,
lowerlim = 0.001, maxit = 400L, maxchange = 1e+06, leftint = NULL,
lefttrunc = NULL, state = NULL, algorithm = NULL, criterion = NULL,
proportion = NULL, startmodel = NULL, trace = NULL,
steps = NULL, CI = NULL, bootstrapsamples = NULL, ...)
A list with the arguments specified is returned.
character, specify a base name model output files are named
with in outfile
.
character, specify the distribution used for the model, options
for all methods, "MCMC"
, "REML"
and "STEP"
are: "binomial"
,
"binomialprobit"
, "gamma"
, "gaussian"
, "multinomial"
,
"poisson"
. For "MCMC"
and "REML"
only: "cox"
, "cumprobit"
and
"multistate"
. For "REML"
only use:
"binomialcomploglog"
, "cumlogit"
, "multinomialcatsp"
,
"multinomialprobit"
, "seqlogit"
, "seqprobit"
.
character, which method should be used for estimation, options
are "MCMC"
, "HMCMC"
(hierarchical MCMC), "REML"
and "STEP"
.
logical, should output be printed to the R
console
during runtime of bayesx
.
logical, should the the output
files and directory
removed after estimation?
character, specify a directory where bayesx
should store all output files, all output files will be named with model.name
as the
base name.
if set to TRUE
, the files in the output directory specified in argument
outfile
will be replaced.
integer, sets the number of iterations for the sampler.
integer, sets the burn-in period of the sampler.
integer, if first or second order random walk priors are
specified, in some cases the data will be slightly grouped: The range between the minimal and
maximal observed covariate values will be divided into (small) intervals, and for each interval
one parameter will be estimated. The grouping has almost no effect on estimation results as long
as the number of intervals is large enough. With the maxint
option the amount of grouping
can be determined by the user. integer is the maximum number of intervals allowed. for
equidistant data, the default maxint = 150
for example, means that no grouping will be
done as long as the number of different observations is equal to or below 150. for non
equidistant data some grouping may be done even if the number of different observations is below
150.
integer, defines the thinning parameter for MCMC simulation.
E.g., step = 50
means, that only every 50th sampled parameter will be stored and used to
compute characteristics of the posterior distribution as means, standard deviations or
quantiles. The aim of thinning is to reach a considerable reduction of disk storing and
autocorrelations between sampled parameters.
logical, option predict
may be specified to compute
samples of the deviance D
, the effective number of parameters pD
and the deviance
information criterion DIC
of the model. In addition, if predict = FALSE
, only
output files of estimated effects will be returned, otherwise an expanded dataset using all
observations would be written in the output directory, also containing the data used for
estimation. Hence, this option is useful when dealing with large data sets, that might cause
memory problems if predict
is set to TRUE
.
integer, set the seed of the random number generator in
BayesX, usually set using function set.seed
.
numeric, defines the value of the hyper-parameters a
and b
for the inverse gamma prior of the overall variance parameter \(\sigma^2\), if
the response distribution is Gaussian. numeric
, must be a positive real valued number.
The default is hyp.prior = c(1, 0.005)
.
character, defines the implemented formulation for the negative
binomial model if the response distribution is negative binomial. The two possibilities are to
work with a negative binomial likelihood (distopt = "nb"
) or to work with the Poisson
likelihood and the multiplicative random effects (distopt = "poga"
).
character, option reference
is meaningful only if
either family = "multinomial"
or family = "multinomialprobit"
is specified as the
response distribution. In this case reference
defines the reference
category to be
chosen. Suppose, for instance, that the response is three categorical with categories 1, 2 and
3. Then reference = 2
defines the value 2 to be the reference
category.
character, defines the zero inflated distribution for the
regression analysis. The two possibilities are to work with a zero infated Poisson distribution
(zipdistopt = "zip"
) or to work with the zero inflated negative binomial likelihood
(zipdistopt = "zinb"
).
character, option begin
is meaningful only if
family = "cox"
is specified as the response distribution. In this case begin specifies
the variable that records when the observation became at risk. This option can be used to handle
left truncation and time-varying covariates. If begin
is not specified, all observations
are assumed to have become at risk at time 0.
integer, besides the posterior means and medians, BayesX
provides point-wise posterior credible intervals for every effect in the model. In a Bayesian
approach based on MCMC simulation techniques credible intervals are estimated by computing the
respective quantiles of the sampled effects. By default, BayesX computes (point-wise)
credible intervals for nominal levels of 80\(\%\) and 95\(\%\). The option level[1]
allows to redefine one of the nominal levels (95\(\%\)). Adding, for instance,
level[1] = 99
to the options list computes credible intervals for a nominal level of
99\(\%\) rather than 95\(\%\). Similar to argument level[1]
the option
level[2]
allows to redefine one of the nominal levels (80\(\%\)). Adding, for instance,
level[2] = 70
to the options list computes credible intervals for a nominal level of
70\(\%\) rather than 80\(\%\).
numeric, defines the termination criterion of the estimation
process. If both the relative changes in the regression coefficients and the variance parameters
are less than eps
, the estimation process is assumed to have converged.
numeric, since small variances are close to the boundary of
their parameter space, the usual fisher-scoring algorithm for their determination has to be
modified. If the fraction of the penalized part of an effect relative to the total effect is
less than lowerlim
, the estimation of the corresponding variance is stopped and the
estimator is defined to be the current value of the variance (see section 6.2 of the BayesX
methodology manual for details).
integer, defines the maximum number of iterations to be used in
estimation. Since the estimation process will not necessarily converge, it may be useful to
define an upper bound for the number of iterations. Note, that BayesX returns results
based on the current values of all parameters even if no convergence could be achieved within
maxit
iterations, but a warning message will be printed in the output window.
numeric, defines the maximum value that is allowed for relative changes in parameters in one iteration to prevent the program from crashing because of numerical problems. Note, that BayesX produces results based on the current values of all parameters even if the estimation procedure is stopped due to numerical problems, but an error message will be printed in the output window.
character, gives the name of the variable that contains the lower (left) boundary \(T_{lo}\) of the interval \([T_{lo}, T_{up}]\) for an interval censored observation. for right censored or uncensored observations we have to specify \(T_{lo} = T_{up}\) . If leftint is missing, all observations are assumed to be right censored or uncensored, depending on the corresponding value of the censoring indicator.
character, option lefttrunc
specifies the name of the
variable containing the left truncation time \(T_{tr}\). For observations that are not
truncated, we have to specify \(T_{tr} = 0\). If lefttrunc
is missing, all observations
are assumed to be not truncated. for multi-state models variable lefttrunc
specifies the
left endpoint of the corresponding time interval.
character, for multi-state models, state
specifies the
current state variable of the process.
character, specifies the selection algorithm. Possible values
are "cdescent1"
(adaptive algorithms in the methodology manual, see subsection 6.3),
"cdescent2"
(adaptive algorithms 1 and 2 with backfitting, see remarks 1 and 2 of section
3 in Belitz and Lang (2008)), "cdescent3"
(search according to cdescent1 followed by
cdescent2 using the selected model in the first step as the start model) and "stepwise"
(stepwise algorithm implemented in the gam
routine of S-plus, see Chambers and
Hastie, 1992). This option will rarely be specified by the user.
character, specifies the goodness of fit criterion. If
criterion = "MSEP"
is specified the data are randomly divided into a test- and validation
data set. The test data set is used to estimate the models and the validation data set is used
to estimate the mean squared prediction error (MSEP) which serves as the goodness of fit
criterion to compare different models. The proportion of data used for the test and validation
sample can be specified using option proportion, see below. The default is to use 75% of
the data for the training sample.
numeric, this option may be used in combination with option
criterion = "MSEP"
, see above. In this case the data are randomly divided into a training
and a validation sample. proportion defines the fraction (between 0 and 1) of the original data
used as training sample.
character, defines the start model for variable selection.
Options are "linear"
, "empty"
, "full"
and "userdefined"
.
character, specifies how detailed the output in the output window
will be. Options are "trace_on"
, "trace_half"
and "trace_off"
.
integer, defines the maximum number of iterations. If the
selection process has not converged after steps
iterations the algorithm terminates and a
warning is raised. Setting steps = 0
allows the user to estimate a certain model without
any model choice. This option will rarely be specified by the user.
character, compute confidence intervals for linear and nonlinear
terms. Option CI
allows to compute confidence intervals. Options are CI = "none"
,
confidence intervals conditional on the selected model CI = "MCMCselect"
and
unconditional confidence intervals where model uncertainty is taken into account
CI = "MCMCbootstrap"
. Both alternatives are computer intensive. Conditional confidence
intervals take much less computing time than unconditional intervals. The advantage of
unconditional confidence intervals is that sampling distributions for the degrees of freedom or
smoothing parameters are obtained.
integer, defines the number of bootstrap samples used
for "CI = MCMCbootstrap"
.
not used
Nikolaus Umlauf, Thomas Kneib, Stefan Lang, Achim Zeileis.
For methodological and reference details see the BayesX manuals available at: https://www.uni-goettingen.de/de/bayesx/550513.html.
Belitz C, Lang S (2008). Simultaneous selection of variables and smoothing parameters in structured additive regression models. Computational Statistics & Data Analysis, 53, 61--81.
Chambers JM, Hastie TJ (eds.) (1992). Statistical Models in S. Chapman & Hall, London.
Umlauf N, Adler D, Kneib T, Lang S, Zeileis A (2015). Structured Additive Regression Models: An R Interface to BayesX. Journal of Statistical Software, 63(21), 1--46. https://www.jstatsoft.org/v63/i21/
bayesx
.
bayesx.control()
if (FALSE) {
set.seed(111)
n <- 500
## regressors
dat <- data.frame(x = runif(n, -3, 3))
## response
dat$y <- with(dat, 10 + sin(x) + rnorm(n, sd = 0.6))
## estimate models with
## bayesx MCMC and REML
b1 <- bayesx(y ~ sx(x), method = "MCMC", data = dat)
b2 <- bayesx(y ~ sx(x), method = "REML", data = dat)
## compare reported output
summary(b1)
summary(b2)
}
Run the code above in your browser using DataLab