bayesx.control: Control Parameters for BayesX

Description

Various parameters that control fitting of regression models using bayesx.

Usage

bayesx.control(model.name = "bayesx.estim", 
  family = "gaussian", method = "MCMC", verbose = FALSE, 
  dir.rm = TRUE, outfile = NULL, replace = FALSE, iterations = 12000L,
  burnin = 2000L, maxint = NULL, step = 10L, predict = TRUE,
  seed = NULL, hyp.prior = NULL, distopt = NULL, reference = NULL,
  zipdistopt = NULL, begin = NULL, level = NULL, eps = 1e-05,
  lowerlim = 0.001, maxit = 400L, maxchange = 1e+06, leftint = NULL,
  lefttrunc = NULL, state = NULL, algorithm = NULL, criterion = NULL, 
  proportion = NULL, startmodel = NULL, trace = NULL, 
  steps = NULL, CI = NULL, bootstrapsamples = NULL, ...)

Value

A list with the arguments specified is returned.

Arguments

model.name: character, specify a base name model output files are named with in outfile.
family: character, specify the distribution used for the model, options for all methods, "MCMC", "REML" and "STEP" are: "binomial", "binomialprobit", "gamma", "gaussian", "multinomial", "poisson". For "MCMC" and "REML" only: "cox", "cumprobit" and "multistate". For "REML" only use: "binomialcomploglog", "cumlogit", "multinomialcatsp", "multinomialprobit", "seqlogit", "seqprobit".
method: character, which method should be used for estimation, options are "MCMC", "HMCMC" (hierarchical MCMC), "REML" and "STEP".
verbose: logical, should output be printed to the R console during runtime of bayesx.
dir.rm: logical, should the the output files and directory removed after estimation?
outfile: character, specify a directory where bayesx should store all output files, all output files will be named with model.name as the base name.
replace: if set to TRUE, the files in the output directory specified in argument outfile will be replaced.
iterations: integer, sets the number of iterations for the sampler.
burnin: integer, sets the burn-in period of the sampler.
maxint: integer, if first or second order random walk priors are specified, in some cases the data will be slightly grouped: The range between the minimal and maximal observed covariate values will be divided into (small) intervals, and for each interval one parameter will be estimated. The grouping has almost no effect on estimation results as long as the number of intervals is large enough. With the maxint option the amount of grouping can be determined by the user. integer is the maximum number of intervals allowed. for equidistant data, the default maxint = 150 for example, means that no grouping will be done as long as the number of different observations is equal to or below 150. for non equidistant data some grouping may be done even if the number of different observations is below 150.
step: integer, defines the thinning parameter for MCMC simulation. E.g., step = 50 means, that only every 50th sampled parameter will be stored and used to compute characteristics of the posterior distribution as means, standard deviations or quantiles. The aim of thinning is to reach a considerable reduction of disk storing and autocorrelations between sampled parameters.
predict: logical, option predict may be specified to compute samples of the deviance D, the effective number of parameters pD and the deviance information criterion DIC of the model. In addition, if predict = FALSE, only output files of estimated effects will be returned, otherwise an expanded dataset using all observations would be written in the output directory, also containing the data used for estimation. Hence, this option is useful when dealing with large data sets, that might cause memory problems if predict is set to TRUE.
seed: integer, set the seed of the random number generator in BayesX, usually set using function set.seed.
hyp.prior: numeric, defines the value of the hyper-parameters a and b for the inverse gamma prior of the overall variance parameter \(\sigma^2\), if the response distribution is Gaussian. numeric, must be a positive real valued number. The default is hyp.prior = c(1, 0.005).
distopt: character, defines the implemented formulation for the negative binomial model if the response distribution is negative binomial. The two possibilities are to work with a negative binomial likelihood (distopt = "nb") or to work with the Poisson likelihood and the multiplicative random effects (distopt = "poga").
reference: character, option reference is meaningful only if either family = "multinomial" or family = "multinomialprobit" is specified as the response distribution. In this case reference defines the reference category to be chosen. Suppose, for instance, that the response is three categorical with categories 1, 2 and 3. Then reference = 2 defines the value 2 to be the reference category.
zipdistopt: character, defines the zero inflated distribution for the regression analysis. The two possibilities are to work with a zero infated Poisson distribution (zipdistopt = "zip") or to work with the zero inflated negative binomial likelihood (zipdistopt = "zinb").
begin: character, option begin is meaningful only if family = "cox" is specified as the response distribution. In this case begin specifies the variable that records when the observation became at risk. This option can be used to handle left truncation and time-varying covariates. If begin is not specified, all observations are assumed to have become at risk at time 0.
level: integer, besides the posterior means and medians, BayesX provides point-wise posterior credible intervals for every effect in the model. In a Bayesian approach based on MCMC simulation techniques credible intervals are estimated by computing the respective quantiles of the sampled effects. By default, BayesX computes (point-wise) credible intervals for nominal levels of 80\(\%\) and 95\(\%\). The option level[1] allows to redefine one of the nominal levels (95\(\%\)). Adding, for instance, level[1] = 99 to the options list computes credible intervals for a nominal level of 99\(\%\) rather than 95\(\%\). Similar to argument level[1] the option level[2] allows to redefine one of the nominal levels (80\(\%\)). Adding, for instance, level[2] = 70 to the options list computes credible intervals for a nominal level of 70\(\%\) rather than 80\(\%\).
eps: numeric, defines the termination criterion of the estimation process. If both the relative changes in the regression coefficients and the variance parameters are less than eps, the estimation process is assumed to have converged.
lowerlim: numeric, since small variances are close to the boundary of their parameter space, the usual fisher-scoring algorithm for their determination has to be modified. If the fraction of the penalized part of an effect relative to the total effect is less than lowerlim, the estimation of the corresponding variance is stopped and the estimator is defined to be the current value of the variance (see section 6.2 of the BayesX methodology manual for details).
maxit: integer, defines the maximum number of iterations to be used in estimation. Since the estimation process will not necessarily converge, it may be useful to define an upper bound for the number of iterations. Note, that BayesX returns results based on the current values of all parameters even if no convergence could be achieved within maxit iterations, but a warning message will be printed in the output window.
maxchange: numeric, defines the maximum value that is allowed for relative changes in parameters in one iteration to prevent the program from crashing because of numerical problems. Note, that BayesX produces results based on the current values of all parameters even if the estimation procedure is stopped due to numerical problems, but an error message will be printed in the output window.
leftint: character, gives the name of the variable that contains the lower (left) boundary \(T_{lo}\) of the interval \([T_{lo}, T_{up}]\) for an interval censored observation. for right censored or uncensored observations we have to specify \(T_{lo} = T_{up}\) . If leftint is missing, all observations are assumed to be right censored or uncensored, depending on the corresponding value of the censoring indicator.
lefttrunc: character, option lefttrunc specifies the name of the variable containing the left truncation time \(T_{tr}\). For observations that are not truncated, we have to specify \(T_{tr} = 0\). If lefttrunc is missing, all observations are assumed to be not truncated. for multi-state models variable lefttrunc specifies the left endpoint of the corresponding time interval.
state: character, for multi-state models, state specifies the current state variable of the process.
algorithm: character, specifies the selection algorithm. Possible values are "cdescent1" (adaptive algorithms in the methodology manual, see subsection 6.3), "cdescent2" (adaptive algorithms 1 and 2 with backfitting, see remarks 1 and 2 of section 3 in Belitz and Lang (2008)), "cdescent3" (search according to cdescent1 followed by cdescent2 using the selected model in the first step as the start model) and "stepwise" (stepwise algorithm implemented in the gam routine of S-plus, see Chambers and Hastie, 1992). This option will rarely be specified by the user.
criterion: character, specifies the goodness of fit criterion. If criterion = "MSEP" is specified the data are randomly divided into a test- and validation data set. The test data set is used to estimate the models and the validation data set is used to estimate the mean squared prediction error (MSEP) which serves as the goodness of fit criterion to compare different models. The proportion of data used for the test and validation sample can be specified using option proportion, see below. The default is to use 75% of the data for the training sample.
proportion: numeric, this option may be used in combination with option criterion = "MSEP", see above. In this case the data are randomly divided into a training and a validation sample. proportion defines the fraction (between 0 and 1) of the original data used as training sample.
startmodel: character, defines the start model for variable selection. Options are "linear", "empty", "full" and "userdefined".
trace: character, specifies how detailed the output in the output window will be. Options are "trace_on", "trace_half" and "trace_off".
steps: integer, defines the maximum number of iterations. If the selection process has not converged after steps iterations the algorithm terminates and a warning is raised. Setting steps = 0 allows the user to estimate a certain model without any model choice. This option will rarely be specified by the user.
CI: character, compute confidence intervals for linear and nonlinear terms. Option CI allows to compute confidence intervals. Options are CI = "none", confidence intervals conditional on the selected model CI = "MCMCselect" and unconditional confidence intervals where model uncertainty is taken into account CI = "MCMCbootstrap". Both alternatives are computer intensive. Conditional confidence intervals take much less computing time than unconditional intervals. The advantage of unconditional confidence intervals is that sampling distributions for the degrees of freedom or smoothing parameters are obtained.
bootstrapsamples: integer, defines the number of bootstrap samples used for "CI = MCMCbootstrap".
...: not used

Author

Nikolaus Umlauf, Thomas Kneib, Stefan Lang, Achim Zeileis.

References

For methodological and reference details see the BayesX manuals available at: https://www.uni-goettingen.de/de/bayesx/550513.html.

Belitz C, Lang S (2008). Simultaneous selection of variables and smoothing parameters in structured additive regression models. Computational Statistics & Data Analysis, 53, 61--81.

Chambers JM, Hastie TJ (eds.) (1992). Statistical Models in S. Chapman & Hall, London.

Umlauf N, Adler D, Kneib T, Lang S, Zeileis A (2015). Structured Additive Regression Models: An R Interface to BayesX. Journal of Statistical Software, 63(21), 1--46. https://www.jstatsoft.org/v63/i21/

Examples

Run this code

bayesx.control()

if (FALSE) {
set.seed(111)
n <- 500
## regressors
dat <- data.frame(x = runif(n, -3, 3))
## response
dat$y <- with(dat, 10 + sin(x) + rnorm(n, sd = 0.6))

## estimate models with
## bayesx MCMC and REML
b1 <- bayesx(y ~ sx(x), method = "MCMC", data = dat)
b2 <- bayesx(y ~ sx(x), method = "REML", data = dat)

## compare reported output
summary(b1)
summary(b2)
}

Run the code above in your browser using DataLab