
This function allows us to generate an estimated longitudinal trajectory (either subject-specific, or by marginalising over the distribution of the group-specific parameters) based on draws from the posterior predictive distribution.
posterior_traj(
object,
m = 1,
newdata = NULL,
newdataLong = NULL,
newdataEvent = NULL,
interpolate = TRUE,
extrapolate = FALSE,
control = list(),
last_time = NULL,
prob = 0.95,
ids,
dynamic = TRUE,
scale = 1.5,
draws = NULL,
seed = NULL,
return_matrix = FALSE,
...
)
A fitted model object returned by the
stan_jm
modelling function. See
stanreg-objects
.
Integer specifying the number or name of the submodel
Deprecated: please use newdataLong
instead.
Optionally, a data frame in which to look for variables with
which to predict. If omitted, the model matrix is used. If newdata
is provided and any variables were transformed (e.g. rescaled) in the data
used to fit the model, then these variables must also be transformed in
newdata
. This only applies if variables were transformed before
passing the data to one of the modeling functions and not if
transformations were specified inside the model formula.
Optionally, a data frame (or in the case of
newdataLong
this can be a list of data frames) in which to look
for variables with which to predict. If omitted, the model matrices are used.
If new data is provided, then two options are available. Either one can
provide observed covariate and outcome data, collected up to some time
t, and use this data to draw new individual-specific coefficients
(i.e. individual-level random effects). This is the default behaviour when
new data is provided, determined by the argument dynamic = TRUE
, and
requiring both newdataLong
and newdataEvent
to be specified.
Alternatively, one can specify dynamic = FALSE
, and then predict
using just covariate data, by marginalising over the distribution
of the group-specific coefficients; in this case, only newdataLong
needs to be specified and it only needs to be a single data frame with
the covariate data for the predictions for the one longitudinal submodel.
A logical specifying whether to interpolate the estimated
longitudinal trajectory in between the observation times. This can be used
to achieve a smooth estimate of the longitudinal trajectory across the
entire follow up time. If TRUE
then the interpolation can be further
controlled using the control
argument.
A logical specifying whether to extrapolate the estimated
longitudinal trajectory beyond the time of the last known observation time.
If TRUE
then the extrapolation can be further controlled using
the control
argument.
A named list with parameters controlling the interpolation or
extrapolation of the estimated longitudinal trajectory when either
interpolate = TRUE
or extrapolate = TRUE
. The
list can contain one or more of the following named elements:
ipoints
a positive integer specifying the number of discrete
time points at which to calculate the estimated longitudinal response for
interpolate = TRUE
. These time points are evenly spaced starting at
0 and ending at the last known observation time for each individual. The
last observation time for each individual is taken to be either: the
event or censoring time if no new data is provided; the time specified
in the "last_time" column if provided in the new data (see Details
section below); or the time of the last longitudinal measurement if new
data is provided but no "last_time" column is included. The default is 15.
epoints
a positive integer specifying the number of discrete
time points at which to calculate the estimated longitudinal response for
extrapolate = TRUE
. These time points are evenly spaced between the
last known observation time for each individual and the extrapolation
distance specifed using either edist
or eprop
.
The default is 15.
eprop
a positive scalar between 0 and 1 specifying the
amount of time across which to extrapolate the longitudinal trajectory,
represented as a proportion of the total observed follow up time for each
individual. For example specifying eprop = 0.2
means that for an
individual for whom the latest of their measurement, event or censoring times
was 10 years, their estimated longitudinal trajectory will be extrapolated
out to 12 years (i.e. 10 + (0.2 * 10)). The default value is 0.2.
edist
a positive scalar specifying the amount of time
across which to extrapolate the longitudinal trajectory for each individual,
represented in units of the time variable time_var
(from fitting the
model). This cannot be specified if eprop
is specified.
A scalar, character string, or NULL
. This argument
specifies the last known survival time for each individual when
conditional predictions are being obtained. If
newdataEvent
is provided and conditional survival predictions are being
obtained, then the last_time
argument can be one of the following:
(i) a scalar, this will use the same last time for each individual in
newdataEvent
; (ii) a character string, naming a column in
newdataEvent
in which to look for the last time for each individual;
(iii) NULL
, in which case the default is to use the time of the latest
longitudinal observation in newdataLong
. If newdataEvent
is
NULL
then the last_time
argument cannot be specified
directly; instead it will be set equal to the event or censoring time for
each individual in the dataset that was used to estimate the model.
If standardised survival probabilities are requested (i.e.
standardise = TRUE
) then conditional survival probabilities are
not allowed and therefore the last_time
argument is ignored.
A scalar between 0 and 1 specifying the width to use for the
uncertainty interval (sometimes called credible interval) for the predicted
mean response and the prediction interval for the predicted (raw) response.
For example prob = 0.95
(the default) means that the 2.5th and 97.5th
percentiles will be provided. Only relevant when return_matrix
is
FALSE
.
An optional vector specifying a subset of subject IDs for whom the
predictions should be obtained. The default is to predict for all individuals
who were used in estimating the model or, if newdata
is specified,
then all individuals contained in newdata
.
A logical that is only relevant if new data is provided
via the newdata
argument. If
dynamic = TRUE
, then new group-specific parameters are drawn for
the individuals in the new data, conditional on their longitudinal
biomarker data contained in newdata
. These group-specific
parameters are then used to generate individual-specific survival probabilities
for these individuals. These are often referred to as "dynamic predictions"
in the joint modelling context, because the predictions can be updated
each time additional longitudinal biomarker data is collected on the individual.
On the other hand, if dynamic = FALSE
then the survival probabilities
will just be marginalised over the distribution of the group-specific
coefficients; this will mean that the predictions will incorporate all
uncertainty due to between-individual variation so there will likely be
very wide credible intervals on the predicted survival probabilities.
A scalar, specifying how much to multiply the asymptotic
variance-covariance matrix for the random effects by, which is then
used as the "width" (ie. variance-covariance matrix) of the multivariate
Student-t proposal distribution in the Metropolis-Hastings algorithm. This
is only relevant when newdataEvent
is supplied and
dynamic = TRUE
, in which case new random effects are simulated
for the individuals in the new data using the Metropolis-Hastings algorithm.
An integer indicating the number of MCMC draws to return. The default is to set the number of draws equal to 200, or equal to the size of the posterior sample if that is less than 200.
An optional seed
to use.
A logical. If TRUE
then a draws
by
nrow(newdata)
matrix is returned which contains all the actual
simulations or draws from the posterior predictive distribution. Otherwise
if return_matrix
is set to FALSE
(the default) then a
data frame is returned, as described in the Value section below.
Other arguments passed to posterior_predict
, for
example draws
, re.form
, seed
, etc.
When return_matrix = FALSE
, a data frame
of class predict.stanjm
. The data frame includes a column for the median
of the posterior predictions of the mean longitudinal response (yfit
),
a column for each of the lower and upper limits of the uncertainty interval
corresponding to the posterior predictions of the mean longitudinal response
(ci_lb
and ci_ub
), and a column for each of the lower and upper
limits of the prediction interval corresponding to the posterior predictions
of the (raw) longitudinal response. The data frame also includes columns for
the subject ID variable, and each of the predictor variables. The returned
object also includes a number of attributes.
When return_matrix = TRUE
, the returned object is the same as that
described for posterior_predict
.
The posterior_traj
function acts as a wrapper to the
posterior_predict
function, but allows predictions to be
easily generated at time points that are interpolated and/or extrapolated
between time zero (baseline) and the last known survival time for the
individual, thereby providing predictions that correspond to a smooth estimate
of the longitudinal trajectory (useful for the plotting via the associated
plot.predict.stanjm
method). In addition it returns a data
frame by default, whereas the posterior_predict
function
returns a matrix; see the Value section below for details. Also,
posterior_traj
allows predictions to only be generated for a subset
of individuals, via the ids
argument.
# NOT RUN {
# Run example model if not already loaded
if (!exists("example_jm")) example(example_jm)
# Obtain subject-specific predictions for all individuals
# in the estimation dataset
pt1 <- posterior_traj(example_jm, interpolate = FALSE, extrapolate = FALSE)
head(pt1)
# Obtain subject-specific predictions only for a few selected individuals
pt2 <- posterior_traj(example_jm, ids = c(1,3,8))
# If we wanted to obtain subject-specific predictions in order to plot the
# longitudinal trajectories, then we might want to ensure a full trajectory
# is obtained by interpolating and extrapolating time. We can then use the
# generic plot function to plot the subject-specific predicted trajectories
# for the first three individuals. Interpolation and extrapolation is
# carried out by default.
pt3 <- posterior_traj(example_jm)
head(pt3) # predictions at additional time points compared with pt1
plot(pt3, ids = 1:3)
# If we wanted to extrapolate further in time, but decrease the number of
# discrete time points at which we obtain predictions for each individual,
# then we could specify a named list in the 'control' argument
pt4 <- posterior_traj(example_jm, control = list(ipoints = 10, epoints = 10, eprop = 0.5))
# If we have prediction data for a new individual, and we want to
# estimate the longitudinal trajectory for that individual conditional
# on this new data (perhaps extrapolating forward from our last
# longitudinal measurement) then we can do that. It requires drawing
# new individual-specific parameters, based on the full likelihood,
# so we must supply new data for both the longitudinal and event
# submodels. These are sometimes known as dynamic predictions.
ndL <- pbcLong[pbcLong$id == 8, , drop = FALSE]
ndE <- pbcSurv[pbcSurv$id == 8, , drop = FALSE]
ndL$id <- "new_subject" # new id can't match one used in training data
ndE$id <- "new_subject"
pt5 <- posterior_traj(example_jm,
newdataLong = ndL,
newdataEvent = ndE)
# By default it is assumed that the last known survival time for
# the individual is the time of their last biomarker measurement,
# but if we know they survived to some later time then we can
# condition on that information using the last_time argument
pt6 <- posterior_traj(example_jm,
newdataLong = ndL,
newdataEvent = ndE,
last_time = "futimeYears")
# Alternatively we may want to estimate the marginal longitudinal
# trajectory for a given set of covariates. To do this, we can pass
# the desired covariate values in a new data frame (however the only
# covariate in our fitted model was the time variable, year). To make sure
# that we marginalise over the random effects, we need to specify an ID value
# which does not correspond to any of the individuals who were used in the
# model estimation and specify the argument dynamic=FALSE.
# The marginal prediction is obtained by generating subject-specific
# predictions using a series of random draws from the random
# effects distribution, and then integrating (ie, averaging) over these.
# Our marginal prediction will therefore capture the between-individual
# variation associated with the random effects.
nd <- data.frame(id = rep("new1", 11), year = (0:10 / 2))
pt7 <- posterior_traj(example_jm, newdataLong = nd, dynamic = FALSE)
head(pt7) # note the greater width of the uncertainty interval compared
# with the subject-specific predictions in pt1, pt2, etc
# Alternatively, we could have estimated the "marginal" trajectory by
# ignoring the random effects (ie, assuming the random effects were set
# to zero). This will generate a predicted longitudinal trajectory only
# based on the fixed effect component of the model. In essence, for a
# linear mixed effects model (ie, a model that uses an identity link
# function), we should obtain a similar point estimate ("yfit") to the
# estimates obtained in pt5 (since the mean of the estimated random effects
# distribution will be approximately 0). However, it is important to note that
# the uncertainty interval will be much more narrow, since it completely
# ignores the between-individual variability captured by the random effects.
# Further, if the model uses a non-identity link function, then the point
# estimate ("yfit") obtained only using the fixed effect component of the
# model will actually provide a biased estimate of the marginal prediction.
# Nonetheless, to demonstrate how we can obtain the predictions only using
# the fixed effect component of the model, we simply specify 're.form = NA'.
# (We will use the same covariate values as used in the prediction for
# example for pt5).
pt8 <- posterior_traj(example_jm, newdataLong = nd, dynamic = FALSE,
re.form = NA)
head(pt8) # note the much narrower ci, compared with pt5
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab