These data sets are used in the paper by Royston and Altman that is
referenced below.
The Rotterdam data is used to create a fitted model, and the GBSG data for
validation of the model. The paper gives references for the data
source.
There are 43 subjects who have died without recurrence, but whose death
time is greater than the censoring time for recurrence.
A common way that this happens is that a death date is updated in the
health record sometime after the research study ended, and said value
is then picked up when a study data set is created.
But it raises serious questions about censoring.
For instance subject 40 is censored for recurrence at 4.2 years and died
at 6.6 years; when creating the endpoint of recurrence free survival
(earlier of recurrence or death), treating them as a death at 6.6 years
implicitly assumes that they were recurrence free just before death.
For this to be true we would have to assume that if they had progressed in
the 2.4 year interval before death (while off study),
that this information would also have been noted
in their general medical record, and would also be captured in
the study data set.
However, that may be unlikely. Death information is often in a
centralized location in electronic health records, easily accessed by a
programmer and merged with the study data, while recurrence may
require manual review. How best to address this is an open issue.