bic.surv: Bayesian Model Averaging for Survival models.

Description

Bayesian Model Averaging for Cox proportional hazards models for censored survival data. This accounts for the model uncertainty inherent in the variable selection problem by averaging over the best models in the model class according to approximate posterior model probability.

Usage

bic.surv(x, ...)
# S3 method for matrix
bic.surv(x, surv.t, cens, strict = FALSE, 
      OR = 20, maxCol = 30, prior.param = c(rep(0.5, ncol(x))), 
      OR.fix = 2, nbest = 150, factor.type = TRUE, 
      factor.prior.adjust = FALSE, call = NULL, ...)
# S3 method for data.frame
bic.surv(x, surv.t, cens, 
      strict = FALSE, OR = 20, maxCol = 30, 
      prior.param = c(rep(0.5, ncol(x))), OR.fix = 2, 
      nbest = 150, factor.type = TRUE, 
      factor.prior.adjust = FALSE, call = NULL, ...)
# S3 method for formula
bic.surv(f, data, strict = FALSE, 
     OR = 20, maxCol = 30, prior.param = c(rep(0.5, ncol(x))), 
     OR.fix = 2, nbest = 150, factor.type = TRUE, 
     factor.prior.adjust = FALSE, call = NULL, ...)

Value

bic.surv returns an object of class bic.surv

The function summary is used to print a summary of the results. The function plot is used to plot posterior distributions for the coefficients. The function imageplot generates an image of the models which were averaged over.

An object of class bic.glm is a list containing at least the following components:

postprob: the posterior probabilities of the models selected
label: labels identifying the models selected
bic: values of BIC for the models
size: the number of independent variables in each of the models
which: a logical matrix with one row per model and one column per variable indicating whether that variable is in the model
probne0: the posterior probability that each variable is non-zero (in percent)
postmean: the posterior mean of each coefficient (from model averaging)
postsd: the posterior standard deviation of each coefficient (from model averaging)
condpostmean: the posterior mean of each coefficient conditional on the variable being included in the model
condpostsd: the posterior standard deviation of each coefficient conditional on the variable being included in the model
mle: matrix with one row per model and one column per variable giving the maximum likelihood estimate of each coefficient for each model
se: matrix with one row per model and one column per variable giving the standard error of each coefficient for each model
reduced: a logical indicating whether any variables were dropped before model averaging
dropped: a vector containing the names of those variables dropped before model averaging
call: the matched call that created the bma.lm object

Arguments

x: a matrix or data frame of independent variables.
surv.t: a vector of values for the dependent variable.
cens: a vector of indicators of censoring (0=censored 1=uncensored)
f: a survival model formula
data: a data frame containing the variables in the model.
strict: logical indicating whether models with more likely submodels are eliminated. FALSE returns all models whose posterior model probability is within a factor of 1/OR of that of the best model.
OR: a number specifying the maximum ratio for excluding models in Occam's window
maxCol: a number specifying the maximum number of columns in design matrix (including intercept) to be kept.
prior.param: a vector of prior probabilities that parameters are non-zero. Default puts a prior of .5 on all parameters. Setting to 1 forces the variable into the model.
OR.fix: width of the window which keeps models after the leaps approximation is done. Because the leaps and bounds gives only an approximation to BIC, there is a need to increase the window at this first "cut" so as to ensure that no good models are deleted. The level of this cut is at 1/(OR^OR.fix); the default value for OR.fix is 2.
nbest: a value specifying the number of models of each size returned to bic.glm by the modified leaps algorithm.
factor.type: a logical value specifying how variables of class "factor" are handled. A factor variable with d levels is turned into (d-1) dummy variables using a treatment contrast. If factor.type = TRUE, models will contain either all or none of these dummy variables. If factor.type = FALSE, models are free to select the dummy variables independently. In this case, factor.prior.adjust determines the prior on these variables.
factor.prior.adjust: a logical value specifying if the prior distribution on dummy variables for factors should be adjusted when factor.type=FALSE. When factor.prior.adjust=FALSE, all dummy variables for variable i have prior equal to prior.param[i]. Note that this makes the prior probability of the union of these variables much higher than prior.param[i]. Setting factor.prior.adjust=T corrects for this so that the union of the dummies equals prior.param[i] (and hence the deletion of the factor has a prior of 1-prior.param[i]). This adjustment changes the individual priors on each dummy variable to 1-(1-pp[i])^(1/(k+1)).
call: used internally
...: unused

Author

Chris Volinsky volinsky@research.att.com; Adrian Raftery raftery@uw.edu; Ian Painter ian.painter@gmail.com

Details

Bayesian Model Averaging accounts for the model uncertainty inherent in the variable selection problem by averaging over the best models in the model class according to approximate posterior model probability. bic.surv averages of Cox regression models.

References

Volinsky, C.T., Madigan, D., Raftery, A.E. and Kronmal, R.A. (1997). "Bayesian Model Averaging in Proportional Hazard Models: Assessing the Risk of a Stroke." Applied Statistics 46: 433-448

Examples

Run this code


if (FALSE) {
## veteran data
library(survival)
data(cancer)

test.bic.surv<- bic.surv(Surv(time,status) ~ ., data = veteran, 
                         factor.type = TRUE)
summary(test.bic.surv, conditional=FALSE, digits=2)
plot(test.bic.surv)

imageplot.bma(test.bic.surv)
}

## pbc data

x<- pbc[1:312,]
surv.t<- x$time
cens<- as.numeric((x$status == 2))

x<- x[,c("age", "albumin", "alk.phos", "ascites", "bili", "edema", 
         "hepato", "platelet", "protime", "sex", "ast", "spiders", 
         "stage", "trt", "copper")]

if (FALSE) {
x$bili<- log(x$bili)
x$alb<- log(x$alb)
x$protime<- log(x$protime)
x$copper<- log(x$copper)
x$ast<- log(x$ast)

test.bic.surv<- bic.surv(x, surv.t, cens, 
                         factor.type=FALSE, strict=FALSE)
summary(test.bic.surv)
}

Run the code above in your browser using DataLab