glmBayesMfp: Bayesian model inference for fractional polynomial GLMs and Cox models

Description

Bayesian model inference for fractional polynomial models from the generalized linear model family or the Cox model is conducted by means of either exhaustive model space evaluation or posterior model sampling. The approach is based on analytical marginal likelihood approximations, using integrated Laplace approximation. Alternatively, test-based Bayes factors (TBFs) are used.

Usage

glmBayesMfp(
  formula = formula(data),
  censInd = NULL,
  data = parent.frame(),
  weights,
  offset,
  family,
  phi = 1,
  tbf = FALSE,
  empiricalBayes = FALSE,
  fixedg = NULL,
  priorSpecs = list(gPrior = HypergPrior(), modelPrior = "sparse"),
  method = c("ask", "exhaustive", "sampling"),
  subset,
  na.action = na.omit,
  verbose = TRUE,
  debug = FALSE,
  nModels,
  nCache = 1e+09,
  chainlength = 10000,
  nGaussHermite = 20,
  useBfgs = FALSE,
  largeVariance = 100,
  useOpenMP = TRUE,
  higherOrderCorrection = FALSE,
  fixedcfactor = FALSE,
  empiricalgPrior = FALSE,
  centerX = TRUE
)

Arguments

formula

model formula

censInd

censoring indicator. Default is NULL, but if a non-NULL vector is supplied, this is assumed to be logical (TRUE = observed, FALSE = censored) and Cox regression is performed.

data

optional data.frame for model variables (defaults to the parent frame)

weights

optionally a vector of positive weights (if not provided, a vector of one's)

offset

this can be used to specify an _a priori_ known component to be included in the linear predictor during fitting. This must be a numeric vector of length equal to the number of cases (if not provided, a vector of zeroes)

family

distribution and link (as in the glm function). Needs to be explicitly specified for all models except the Cox model.

phi

value of the dispersion parameter (defaults to 1)

tbf

Use TBF methodology to compute the marginal likelihood? (not default) Must be TRUE if Cox regression is done.

empiricalBayes

rank the models in terms of conditional marginal likelihood, using an empirical Bayes estimate of g? (not default) Due to coding structure, the prior on g must be given in priorSpecs although it does not have an effect when empiricalBayes==TRUE.

fixedg

If this is a number, then it is taken as a fixed value of g, and as with the empiricalBayes option, the models are ranked in terms of conditional marginal likelihood. By default, this option is NULL, which means that g is estimated in a fully or empirical Bayesian way.

priorSpecs

prior specifications, see details

method

which method should be used to explore the posterior model space? (default: ask the user)

subset

optional subset expression

na.action

default is to skip rows with missing data, and no other option supported at the moment

verbose

should information on computation progress be given? (default)

debug

print debugging information? (not default)

nModels

how many best models should be saved? (default: 1% of the total number of (cached) models). Must not be larger than nCache if method == "sampling".

nCache

maximum number of best models to be cached at the same time during the model sampling, only has effect if method = sampling

chainlength

length of the model sampling chain (only has an effect if sampling has been chosen as method)

nGaussHermite

number of quantiles used in Gauss Hermite quadrature for marginal likelihood approximation (and later in the MCMC sampler for the approximation of the marginal covariance factor density). If empiricalBayes or a fixed g is used, this option has no effect.

useBfgs

Shall the BFGS algorithm be used in the internal maximization (not default)? Else, the default Brent optimize routine is used, which seems to be more robust. If empiricalBayes or a fixed g is used, this option has no effect and always the Brent optimize routine is used.

largeVariance

When should the BFGS variance estimate be considered “large”, so that a reestimation of it is computed? (Only has an effect if useBfgs == TRUE, default: 100)

useOpenMP

shall OpenMP be used to accelerate the computations? (default)

higherOrderCorrection

should a higher-order correction of the Laplace approximation be used, which works only for canonical GLMs? (not default)

fixedcfactor

If TRUE sets the c factor assuming alpha is set to 0. Otherwise take alpha=mean(y)

empiricalgPrior

If TRUE uses the the observed isnformation matrix instead of X'X in the g prior. (Experimental)

centerX

Center the data before fitting (FALSE)

Value

An object of S3 class GlmBayesMfp.

Details

The formula is of the form y ~ bfp (x1, max = 4) + uc (x2 + x3), that is, the auxiliary functions bfp and uc must be used for defining the fractional polynomial and uncertain fixed form covariates terms, respectively. There must be an intercept, and no other fixed covariates are allowed. All max arguments of the bfp terms must be identical. y is the response vector for GLMs or the vector of survival times for Cox regression. Note that Cox regression is only implemented with TBFs.

The prior specifications are a list:

gPrior: A g-prior class object. Defaults to a hyper-g prior. See '>GPrior for more information.
modelPrior: choose if a flat model prior ("flat"), a model prior favoring sparse models explicitly (default, "sparse"), or a dependent model prior ("dependent") should be used.

If method = "ask", the user is prompted with the maximum cardinality of the model space and can then decide whether to use posterior sampling or the exhaustive model space evaluation.

Note that if you specify only one FP term, the exhaustive model search must be done, due to the structure of the model sampling algorithm. However, in reality this will not be a problem as the model space will typically be very small.