The multi-state survival functions coxph
and survfit
allow for two forms of input data. This routine converts between them.
The function is normally called behind the scenes when Surv2
is
as the response.
Surv2data(formula, data, subset, id)
a list with elements
an updated model frame (fewer rows, unchanged columns)
the constructed response variable
the current state for each of the rows
a model formula
a data frame
optional, selects rows of the data to be retained
a variable that identified multiple rows for the same subject, normally found in the referenced data set
For timeline style data, each row is uniquely identified by an (identifier, time) pair. The time could be a date, time from entry to a study, age, etc, (there may often be more than one time variable). The identifier and time cannot be missing. The remaining covariates represent values that were observed at that time point. Often, a given covariate is observed at only a subset of times and is missing at others. At the time of death, in particular, often only the identifier, time, and status indicator are known.
In the resulting data set missing covariates are replaced by their last known value, and the response y will be a Surv(time1, time2, endpoint) object.