Learn R Programming

rstanarm (version 2.17.2)

posterior_traj: Estimate the subject-specific or marginal longitudinal trajectory

Description

This function allows us to generate an estimated longitudinal trajectory (either subject-specific, or by marginalising over the distribution of the group-specific parameters) based on draws from the posterior predictive distribution.

Usage

posterior_traj(object, m = 1, newdata = NULL, interpolate = TRUE,
  extrapolate = FALSE, prob = 0.95, ids, control = list(),
  return_matrix = FALSE, ...)

Arguments

object

A fitted model object returned by the stan_jm modelling function. See stanreg-objects.

m

Integer specifying the number or name of the submodel

newdata

Optionally, a data frame in which to look for variables with which to predict. If omitted, the model matrix is used. If newdata is provided and any variables were transformed (e.g. rescaled) in the data used to fit the model, then these variables must also be transformed in newdata. This only applies if variables were transformed before passing the data to one of the modeling functions and not if transformations were specified inside the model formula.

interpolate

A logical specifying whether to interpolate the estimated longitudinal trajectory in between the observation times. This can be used to achieve a smooth estimate of the longitudinal trajectory across the entire follow up time. If TRUE then the interpolation can be further controlled using the control argument.

extrapolate

A logical specifying whether to extrapolate the estimated longitudinal trajectory beyond the time of the last known observation time. If TRUE then the extrapolation can be further controlled using the control argument.

prob

A scalar between 0 and 1 specifying the width to use for the uncertainty interval (sometimes called credible interval) for the predicted mean response and the prediction interval for the predicted (raw) response. For example prob = 0.95 (the default) means that the 2.5th and 97.5th percentiles will be provided. Only relevant when return_matrix is FALSE.

ids

An optional vector specifying a subset of subject IDs for whom the predictions should be obtained. The default is to predict for all individuals who were used in estimating the model or, if newdata is specified, then all individuals contained in newdata.

control

A named list with parameters controlling the interpolation or extrapolation of the estimated longitudinal trajectory when either interpolate = TRUE or extrapolate = TRUE. The list can contain one or more of the following named elements:

ipoints

a positive integer specifying the number of discrete time points at which to calculate the estimated longitudinal response for interpolate = TRUE. These time points are evenly spaced starting at 0 and ending at the last known observation time for each individual. The last observation time for each individual is taken to be either: the event or censoring time if no new data is provided; the time specified in the "last_time" column if provided in the new data (see Details section below); or the time of the last longitudinal measurement if new data is provided but no "last_time" column is included. The default is 15.

epoints

a positive integer specifying the number of discrete time points at which to calculate the estimated longitudinal response for extrapolate = TRUE. These time points are evenly spaced between the last known observation time for each individual and the extrapolation distance specifed using either edist or eprop. The default is 15.

eprop

a positive scalar between 0 and 1 specifying the amount of time across which to extrapolate the longitudinal trajectory, represented as a proportion of the total observed follow up time for each individual. For example specifying eprop = 0.2 means that for an individual for whom the latest of their measurement, event or censoring times was 10 years, their estimated longitudinal trajectory will be extrapolated out to 12 years (i.e. 10 + (0.2 * 10)). The default value is 0.2.

edist

a positive scalar specifying the amount of time across which to extrapolate the longitudinal trajectory for each individual, represented in units of the time variable time_var (from fitting the model). This cannot be specified if eprop is specified.

return_matrix

A logical. If TRUE then a draws by nrow(newdata) matrix is returned which contains all the actual simulations or draws from the posterior predictive distribution. Otherwise if return_matrix is set to FALSE (the default) then a data frame is returned, as described in the Value section below.

...

Other arguments passed to posterior_predict, for example draws, re.form, seed, etc.

Value

When return_matrix = FALSE, a data frame of class predict.stanjm. The data frame includes a column for the median of the posterior predictions of the mean longitudinal response (yfit), a column for each of the lower and upper limits of the uncertainty interval corresponding to the posterior predictions of the mean longitudinal response (ci_lb and ci_ub), and a column for each of the lower and upper limits of the prediction interval corresponding to the posterior predictions of the (raw) longitudinal response. The data frame also includes columns for the subject ID variable, and each of the predictor variables. The returned object also includes a number of attributes.

When return_matrix = TRUE, the returned object is the same as that described for posterior_predict.

Details

The posterior_traj function acts as a wrapper to the posterior_predict function, but allows predictions to be easily generated at time points that are interpolated and/or extrapolated between time zero (baseline) and the last known survival time for the individual, thereby providing predictions that correspond to a smooth estimate of the longitudinal trajectory (useful for the plotting via the associated plot.predict.stanjm method). In addition it returns a data frame by default, whereas the posterior_predict function returns a matrix; see the Value section below for details. Also, posterior_traj allows predictions to only be generated for a subset of individuals, via the ids argument.

See Also

plot.predict.stanjm, posterior_predict, posterior_survfit

Examples

Run this code
# NOT RUN {
  # Run example model if not already loaded
  if (!exists("example_jm")) example(example_jm)
  
  # Obtain subject-specific predictions for all individuals 
  # in the estimation dataset
  pt1 <- posterior_traj(example_jm, interpolate = FALSE, extrapolate = FALSE)
  head(pt1)
  
  # Obtain subject-specific predictions only for a few selected individuals
  pt2 <- posterior_traj(example_jm, ids = c(1,3,8))
  
  # If we wanted to obtain subject-specific predictions in order to plot the 
  # longitudinal trajectories, then we might want to ensure a full trajectory 
  # is obtained by interpolating and extrapolating time. We can then use the 
  # generic plot function to plot the subject-specific predicted trajectories
  # for the first three individuals. Interpolation and extrapolation is 
  # carried out by default.
  pt3 <- posterior_traj(example_jm)
  head(pt3) # predictions at additional time points compared with pt1 
  plot(pt3, ids = 1:3)
  
  # If we wanted to extrapolate further in time, but decrease the number of 
  # discrete time points at which we obtain predictions for each individual, 
  # then we could specify a named list in the 'control' argument
  pt4 <- posterior_traj(example_jm, control = list(ipoints = 10, epoints = 10, eprop = 0.5))
  
  # Alternatively we may want to estimate the marginal longitudinal
  # trajectory for a given set of covariates. To do this, we can pass
  # the desired covariate values in a new data frame (however the only
  # covariate in our fitted model was the time variable, year). To make sure  
  # that we marginalise over the random effects, we need to specify an ID value
  # which does not correspond to any of the individuals who were used in the
  # model estimation. (The marginal prediction is obtained by generating 
  # subject-specific predictions using a series of random draws from the random 
  # effects distribution, and then integrating (ie, averaging) over these. 
  # Our marginal prediction will therefore capture the between-individual 
  # variation associated with the random effects.)
  nd <- data.frame(id = rep("new1", 11), year = (0:10 / 2))
  pt5 <- posterior_traj(example_jm, newdata = nd)
  head(pt5)  # note the greater width of the uncertainty interval compared 
             # with the subject-specific predictions in pt1, pt2, etc
  
  # Alternatively, we could have estimated the "marginal" trajectory by 
  # ignoring the random effects (ie, assuming the random effects were set 
  # to zero). This will generate a predicted longitudinal trajectory only
  # based on the fixed effect component of the model. In essence, for a 
  # linear mixed effects model (ie, a model that uses an identity link 
  # function), we should obtain a similar point estimate ("yfit") to the
  # estimates obtained in pt5 (since the mean of the estimated random effects
  # distribution will be approximately 0). However, it is important to note that
  # the uncertainty interval will be much more narrow, since it completely
  # ignores the between-individual variability captured by the random effects.
  # Further, if the model uses a non-identity link function, then the point
  # estimate ("yfit") obtained only using the fixed effect component of the
  # model will actually provide a biased estimate of the marginal prediction.
  # Nonetheless, to demonstrate how we can obtain the predictions only using 
  # the fixed effect component of the model, we simply specify 're.form = NA'. 
  # (We will use the same covariate values as used in the prediction for 
  # example for pt5).
  pt6 <- posterior_traj(example_jm, newdata = nd, re.form = NA)
  head(pt6)  # note the much narrower ci, compared with pt5
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab