Full Bayesian cost-effectiveness models to handle missing data in longitudinal outcomes under different missing data
mechanism assumptions, using alternative parametric distributions for the effect and cost variables and
using a selection model approach to identify the model. The analysis is performed using the BUGS
language,
which is implemented in the software JAGS
using the function jags
The output is stored in an object of class 'missingHE'.
selection_long(
data,
model.eff,
model.cost,
model.mu = mu ~ 1,
model.mc = mc ~ 1,
dist_u,
dist_c,
type,
prob = c(0.025, 0.975),
time_dep = "AR1",
n.chains = 2,
n.iter = 20000,
n.burnin = floor(n.iter/2),
inits = NULL,
n.thin = 1,
ppc = FALSE,
save_model = FALSE,
prior = "default",
...
)
An object of the class 'missingHE' containing the following elements
A list containing the original longitudinal data set provided in data
(see Arguments), the number of observed and missing individuals
, the total number of individuals by treatment arm and the indicator vectors for the missing values for each time point
A list containing the output of a JAGS
model generated from the functions jags
, and
the posterior samples for the main parameters of the model and the imputed values
A list containing the output of the economic evaluation performed using the function bcea
A character variable that indicate which type of missingness mechanism has been used to run the model,
either MAR
or MNAR
(see details)
A character variable that indicate which type of analysis was conducted, either using a wide
or longitudinal
dataset
A character variable that indicate which type of time dependence assumption was made, either none
or AR1
A data frame in which to find the longitudinal variables supplied in model.eff
, model.cost
(model formulas for effects and costs)
and model.mu
, model.mc
(model formulas for the missing effect and cost models). Among these,
effectiveness, cost, time and treatment indicator (only two arms) variables must always be provided and named 'u', 'c' , 'time' and 't', respectively.
A formula expression in conventional R
linear modelling syntax. The response must be a health economic
effectiveness outcome ('u') whose name must correspond to that used in data
. Any covariates in the model must be provided on the right-hand side of the formula.
If there are no covariates, 1
should be specified on the right hand side of the formula. By default, covariates are placed on the "location" parameter of the distribution through a linear model.
Random effects can also be specified for each model parameter. See details for how these can be specified.
A formula expression in conventional R
linear modelling syntax. The response must be a health economic
cost outcome ('c') whose name must correspond to that used in data
. Any covariates in the model must be provided on the right-hand side of the formula.
If there are no covariates, 1
should be specified on the right hand side of the formula. By default, covariates are placed on the "location" parameter of the distribution through a linear model.
A joint bivariate distribution for effects and costs can be specified by including 'e' on the right-hand side of the formula for the costs model.
Random effects can also be specified for each model parameter. See details for how these can be specified.
A formula expression in conventional R
linear modelling syntax. The response must be indicated with the
term 'mu' (missing effects) and any covariates must be provided on the right-hand side of the formula. If there are no covariates, 1
should be specified on the right hand side of the formula.
By default, covariates are placed on the "probability" parameter for the missing effects through a logistic-linear model.
Random effects can also be specified for each model parameter. See details for how these can be specified.
A formula expression in conventional R
linear modelling syntax. The response must be indicated with the term 'mc' (missing costs) and any covariates must be provided on the right-hand side of the formula.
If there are no covariates, 1
should be specified on the right hand side of the formula. By default, covariates are placed on the "probability" parameter for the missing costs through a logistic-linear model.
Random effects can also be specified for each model parameter. See details for how these can be specified.
Distribution assumed for the effects. Current available chocies are: Normal ('norm'), Beta ('beta'), Gamma ('gamma'), Exponential ('exp'), Weibull ('weibull'), Logistic ('logis'), Poisson ('pois'), Negative Binomial ('nbinom') or Bernoulli ('bern').
Distribution assumed for the costs. Current available chocies are: Normal ('norm'), Gamma ('gamma') or LogNormal ('lnorm').
Type of missingness mechanism assumed. Choices are Missing At Random (MAR) and Missing Not At Random (MNAR).
A numeric vector of probabilities within the range (0,1), representing the upper and lower CI sample quantiles to be calculated and returned for the imputed values.
Type of dependence structure assumed between effectiveness and cost outcomes. Current choices include: autoregressive structure of order one ('AR1') - default - and independence ('none').
Number of chains.
Number of iterations.
Number of warmup iterations.
A list with elements equal to the number of chains selected; each element of the list is itself a list of starting values for the
JAGS
model, or a function creating (possibly random) initial values. If inits
is NULL
, JAGS
will generate initial values for all the model parameters.
Thinning interval.
Logical. If ppc
is TRUE
, the estimates of the parameters that can be used to generate replications from the model are saved.
Logical. If save_model
is TRUE
, a txt
file containing the model code is printed
in the current working directory.
A list containing the hyperprior values provided by the user. Each element of this list must be a vector of length two
containing the user-provided hyperprior values and must be named with the name of the corresponding parameter. For example, the hyperprior
values for the standard deviation effect parameters can be provided using the list prior = list('sigma.prior.u' = c(0, 100))
.
For more information about how to provide prior hypervalues for different types of parameters and models see details.
If prior
is set to 'default', the default values will be used.
Additional arguments that can be provided by the user. Examples are center = TRUE
to center all the covariates in the model
or the additional arguments that can be provided to the function bcea
to summarise the health economic evaluation results.
Users may also provide, using the argument qaly_calc
and tcost_calc
, the weights to be used for computing the posterior mean QALYs (Area Under the Curve approach)
and Total cost (sum over follow-up costs) quantities which are then used to generate any cost-effectiveness decision output (e.g. Cost-Effectiveness Plane). If these are not provided,
default values of '0.5' and '1' are used, respectively.
Andrea Gabrio
Depending on the distributions specified for the outcome variables in the arguments dist_u
and
dist_c
and the type of missingness mechanism specified in the argument type
, different selection models
are built and run in the background by the function selection
. These models consist in multinomial logistic regressions that are used to estimate
the probability of a missingness dropout pattern k
(completers, intermittent, dropout) in one or both the longitudinal outcomes. A simple example can be used
to show how these selection models are specified. Consider a longitudinal data set comprising a response variable \(y\) measures at S occasions and a set of centered covariate \(X_j\).
For each subject in the trial \(i = 1, ..., n\) and time \(s = 1, ..., S\) we define an indicator variable \(m_i\) taking value k = 1
if the \(i\)-th individual is associated with
no missing value (completer), a value k = 2
for intermittent missingness over the study period, and a value k = 3
for dropout missingness.
This is modelled as:
$$m_i ~ Multinomial(\pi^k_i)$$
$$\pi^k_i = \phi^k_i/\sum\phi_i$$
$$log(\phi^k_i) = \gamma^k_0 + \sum\gamma^k_j X_j + \delta^k (y)$$
where
\(\pi_i\) is the individual probability of a missing value in \(y\) for pattern \(k\) at a given time point.
\(\gamma^k_0\) represents the marginal probability of a missingness dropout pattern in \(y\) for pattern \(k\) on the log scale at a given time point.
\(\gamma^k_j\) represents the impact on the probability of a specific missingness dropout pattern in \(y\) of the centered covariates \(X_j\) for pattern \(k\) at a given time point.
\(\delta^k\) represents the impact on the probability of a specific missingness dropout pattern \(k\) in \(y\) of the missing pattern itself at a given time point.
When \(\delta = 0\) the model assumes a 'MAR' mechanism, while when \(\delta != 0\) the mechanism is 'MNAR'. For the parameters indexing the missingness model, the default prior distributions assumed are the following:
\(\gamma^k_0 ~ Logisitc(0, 1)\)
\(\gamma^k_j ~ Normal(0, 0.01)\)
\(\delta^k ~ Normal(0, 1)\)
When user-defined hyperprior values are supplied via the argument prior
in the function selection_long
, the elements of this list (see Arguments)
must be vectors of length two containing the user-provided hyperprior values and must take specific names according to the parameters they are associated with.
Specifically, the names for the parameters indexing the model which are accepted by missingHE are the following:
location parameters \(\alpha_0\) and \(\beta_0\): "mean.prior.u"(effects) and/or "mean.prior.c"(costs)
auxiliary parameters \(\sigma\): "sigma.prior.u"(effects) and/or "sigma.prior.c"(costs)
covariate parameters \(\alpha_j\) and \(\beta_j\): "alpha.prior"(effects) and/or "beta.prior"(costs)
marginal probability of missing values for pattern \(k\) \(\gamma^k_0\): "p.prior.u"(effects) and/or "p.prior.c"(costs)
covariate parameters in the missingness model for pattern \(k\) \(\gamma^k_j\) (if covariate data provided): "gamma.prior.u"(effects) and/or "gamma.prior.c"(costs)
mnar parameter for pattern \(k\) \(\delta^k\): "delta.prior.u"(effects) and/or "delta.prior.c"(costs)
For simplicity, here we have assumed that the set of covariates \(X_j\) used in the models for the effects/costs and in the
model of the missing effect/cost values is the same. However, it is possible to specify different sets of covariates for each model
using the arguments in the function selection_long
(see Arguments).
For each model, random effects can also be specified for each parameter by adding the term + (x | z) to each model formula, where x is the fixed regression coefficient for which also the random effects are desired and z is the clustering variable across which the random effects are specified (must be the name of a factor variable in the dataset). Multiple random effects can be specified using the notation + (x1 + x2 | site) for each covariate that was included in the fixed effects formula. Random intercepts are included by default in the models if a random effects are specified but they can be removed by adding the term 0 within the random effects formula, e.g. + (0 + x | z).
Mason, AJ. Gomes, M. Carpenter, J. Grieve, R. (2021). Flexible Bayesian longitudinal models for cost‐effectiveness analyses with informative missing data. Health economics, 30(12), 3138-3158.
Daniels, MJ. Hogan, JW. Missing Data in Longitudinal Studies: strategies for Bayesian modelling and sensitivity analysis, CRC/Chapman Hall.
Baio, G.(2012). Bayesian Methods in Health Economics. CRC/Chapman Hall, London.
Gelman, A. Carlin, JB., Stern, HS. Rubin, DB.(2003). Bayesian Data Analysis, 2nd edition, CRC Press.
Plummer, M. JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. (2003).
jags
, bcea
# Quck example to run using subset of PBS dataset
# Load longitudinal dataset
PBS.long <- PBS
# \donttest{
# Run the model using the selection function assuming a MAR mechanism
# Use only 100 iterations to run a quick check
model.selection.long <- selection_long(data = PBS.long, model.eff = u ~ 1,model.cost = c ~ 1,
model.mu = mu ~ 1, model.mc = mc ~ 1, dist_u = "norm", dist_c = "norm",
type = "MAR", n.chains = 2, n.iter = 100, ppc = TRUE, time_dep = "none")
# Print the results of the JAGS model
print(model.selection.long)
#
# Extract regression coefficient estimates
coef(model.selection.long)
#
# Summarise the CEA information from the model
summary(model.selection.long)
# Further examples which take longer to run
model.selection.long <- selection_long(data = PBS.long, model.eff = u ~ 1,model.cost = c ~ u,
model.se = mu ~ 1, model.mc = mc ~ 1, dist_u = "norm", dist_c = "norm",
type = "MAR", n.chains = 2, n.iter = 500, ppc = FALSE, time_dep = "none")
#
# Print results for all imputed values
print(model.selection.long, value.mis = TRUE)
# Use looic to assess model fit
pic.looic <- pic(model.selection.long, criterion = "looic", module = "total")
pic.looic
# Show density plots for all parameters
diag.hist <- diagnostic(model.selection.long, type = "denplot", param = "all")
# Plots of imputations for all data
p1 <- plot(model.selection.long, class = "scatter", outcome = "all")
# Summarise the CEA results
summary(model.selection.long)
# }
#
#
Run the code above in your browser using DataLab