The val.surv
function is useful for validating predicted survival
probabilities against right-censored failure times. If u
is
specified, the hazard regression function hare
in the
polspline
package is used to relate predicted survival
probability at time u
to observed survival times (and censoring
indicators) to estimate the actual survival probability at time
u
as a function of the estimated survival probability at that
time, est.surv
. If est.surv
is not given, fit
must
be specified and the survest
function is used to obtain the
predicted values (using newdata
if it is given, or using the
stored linear predictor values if not). hare
or movStats
(when method="smoothkm"
) is given the sole
predictor fun(est.surv)
where fun
is given by the user or
is inferred from fit
. fun
is the function of predicted
survival probabilities that one expects to create a linear relationship
with the linear predictors.
hare
uses an adaptive procedure to find a linear spline of
fun(est.surv)
in a model where the log hazard is a linear spline
in time \(t\), and cross-products between the two splines are allowed so as to
not assume proportional hazards. Thus hare
assumes that the
covariate and time functions are smooth but not much else, if the number
of events in the dataset is large enough for obtaining a reliable
flexible fit. Or specify method="smoothkm"
to use the Hmisc
movStats
function to compute smoothed (by default using supsmu
)
moving window Kaplan-Meier estimates. This method is more flexible than hare
.
There are special print
and plot
methods
when u
is given. In this case, val.surv
returns an object
of class "val.survh"
, otherwise it returns an object of class
"val.surv"
.
If u
is not specified, val.surv
uses Cox-Snell (1968)
residuals on the cumulative
probability scale to check on the calibration of a survival model
against right-censored failure time data. If the predicted survival
probability at time \(t\) for a subject having predictors \(X\) is
\(S(t|X)\), this method is based on the fact that the predicted
probability of failure before time \(t\), \(1 - S(t|X)\), when
evaluated at the subject's actual survival time \(T\), has a uniform
(0,1) distribution. The quantity \(1 - S(T|X)\) is right-censored
when \(T\) is. By getting one minus the Kaplan-Meier estimate of the
distribution of \(1 - S(T|X)\) and plotting against the 45 degree line
we can check for calibration accuracy. A more stringent assessment can
be obtained by stratifying this analysis by an important predictor
variable. The theoretical uniform distribution is only an approximation
when the survival probabilities are estimates and not population values.
When censor
is specified to val.surv
, a different
validation is done that is more stringent but that only uses the
uncensored failure times. This method is used for type I censoring when
the theoretical censoring times are known for subjects having uncensored
failure times. Let \(T\), \(C\), and \(F\) denote respectively
the failure time, censoring time, and cumulative failure time
distribution (\(1 - S\)). The expected value of \(F(T | X)\) is 0.5
when \(T\) represents the subject's actual failure time. The expected
value for an uncensored time is the expected value of \(F(T | T \leq
C, X) = 0.5 F(C | X)\). A smooth plot of \(F(T|X) - 0.5 F(C|X)\) for
uncensored \(T\) should be a flat line through \(y=0\) if the model
is well calibrated. A smooth plot of \(2F(T|X)/F(C|X)\) for
uncensored \(T\) should be a flat line through \(y=1.0\). The smooth
plot is obtained by smoothing the (linear predictor, difference or
ratio) pairs.
Note that the Cox-Snell residual plot is not very sensitive to model lack of fit.