Return fitted survival, cumulative hazard or hazard at a series of times
from a fitted flexsurvreg
or flexsurvspline
model.
# S3 method for flexsurvreg
summary(object, newdata = NULL, X = NULL,
type = "survival", fn = NULL, t = NULL, quantiles = 0.5, start = 0,
ci = TRUE, se = FALSE, B = 1000, cl = 0.95, tidy = FALSE, ...)
Output from flexsurvreg
or
flexsurvspline
, representing a fitted survival model object.
Data frame containing covariate values to produce fitted values for. Or a list that can be coerced to such a data frame. There must be a column for every covariate in the model formula, and one row for every combination of covariates the fitted values are wanted for. These are in the same format as the original data, with factors as a single variable, not 0/1 contrasts.
If this is omitted, if there are any continuous covariates, then a single summary is provided with all covariates set to their mean values in the data - for categorical covariates, the means of the 0/1 indicator variables are taken. If there are only factor covariates in the model, then all distinct groups are used by default.
Alternative way of defining covariate values to produce fitted
values for. Since version 0.4, newdata
is an easier way that
doesn't require the user to create factor contrasts, but X
has been
kept for backwards compatibility.
Columns of X
represent different covariates, and rows represent
multiple combinations of covariate values. For example
matrix(c(1,2),nrow=2)
if there is only one covariate in the model,
and we want survival for covariate values of 1 and 2. A vector can also be
supplied if just one combination of covariates is needed.
For ``factor'' (categorical) covariates, the values of the contrasts
representing factor levels (as returned by the contrasts
function) should be used. For example, for a covariate agegroup
specified as an unordered factor with levels 20-29, 30-39, 40-49,
50-59
, and baseline level 20-29
, there are three contrasts. To
return summaries for groups 20-29
and 40-49
, supply X =
rbind(c(0,0,0), c(0,1,0))
, since all contrasts are zero for the baseline
level, and the second contrast is ``turned on'' for the third level
40-49
.
"survival"
for survival probabilities.
"cumhaz"
for cumulative hazards.
"hazard"
for hazards.
"rmst"
for restricted mean survival.
"mean"
for mean survival.
"median"
for median survival (alternative to type="quantile"
with quantiles=0.5
).
"quantile"
for quantiles of the survival time distribution.
"link"
for the fitted value of the location parameter (i.e. the "linear predictor")
Ignored if "fn"
is specified.
Custom function of the parameters to summarise against time.
This has optional first two arguments t
representing time, and
start
representing left-truncation points, and any remaining
arguments must be parameters of the distribution. It should return a
vector of the same length as t
.
Times to calculate fitted values for. By default, these are the sorted unique observation (including censoring) times in the data - for left-truncated datasets these are the "stop" times.
If type="quantile"
, this specifies the quantiles of the survival time distribution to return estimates for.
Optional left-truncation time or times. The returned survival, hazard or cumulative hazard will be conditioned on survival up to this time.
A vector of the same length as t
can be supplied to allow different
truncation times for each prediction time, though this doesn't make sense
in the usual case where this function is used to calculate a predicted
trajectory for a single individual. This is why the default start
time was changed for version 0.4 of flexsurv - this was previously a
vector of the start times observed in the data.
Set to FALSE
to omit confidence intervals.
Set to TRUE
to include standard errors.
Number of simulations from the normal asymptotic distribution of
the estimates used to calculate confidence intervals or standard errors.
Decrease for greater
speed at the expense of accuracy, or set B=0
to turn off calculation
of CIs and SEs.
Width of symmetric confidence intervals, relative to 1.
If TRUE
, then the results are returned as a tidy data
frame instead of a list. This can help with using the ggplot2
package to compare summaries for different covariate values.
Further arguments passed to or from other methods. Currently unused.
If tidy=FALSE
, a list with one component for each unique
covariate value (if there are only categorical covariates) or one component
(if there are no covariates or any continuous covariates). Each of these
components is a matrix with one row for each time in t
, giving the
estimated survival (or cumulative hazard, or hazard) and 95% confidence
limits. These list components are named with the covariate names and
values which define them.
If tidy=TRUE
, a data frame is returned instead. This is formed by
stacking the above list components, with additional columns to identify the
covariate values that each block corresponds to.
If there are multiple summaries, an additional list component named
X
contains a matrix with the exact values of contrasts (dummy
covariates) defining each summary.
The plot.flexsurvreg
function can be used to quickly plot
these model-based summaries against empirical summaries such as
Kaplan-Meier curves, to diagnose model fit.
Confidence intervals are obtained by sampling randomly from the asymptotic normal distribution of the maximum likelihood estimates and then taking quantiles (see, e.g. Mandel (2013)).
Time-dependent covariates are not currently supported. The covariate values are assumed to be constant through time for each fitted curve.
Mandel, M. (2013). "Simulation based confidence intervals for functions with complicated derivatives." The American Statistician (in press).