tmerge: Time based merge for survival data

Description

A common task in survival analysis is the creation of start,stop data sets which have multiple intervals for each subject, along with the covariate values that apply over that interval. This function aids in the creation of such data sets.

Usage

tmerge(data1, data2,  id,..., tstart, tstop, options)

Arguments

data1

the primary data set, to which new variables and/or observation will be added

data2

optional second data set in which the other arguments will be found

subject identifier

...

operations that add new variables or intervals, see below

tstart

optional variable to define the valid time range for each subject, only used on an initial call

tstop

optional variable to define the valid time range for each subject, only used on an initial call

options

a list of options. Valid ones are id, tstart, and tstop, which will be the names of the three mandatory variables in the output data. The other is defer, which sets a numeric amount of time before an event when covariate changes are disa

Value

a data frame with two extra attributes tname and tcount. The first contains the names of the key variables; it's persistence from call to call allows the user to avoid constantly reentering the options arguments. The tcount variable contains counts of the match types. New time values that occur before the first interval for a subject are "early", those after the last interval for a subject are "late", and those that fall into a gap are of type "gap".
The most common type will usually be "within", for those new times that fall inside an existing interval and cause it to be split into two. Observations that fall exactly on the edge of an interval are counted as "leading" edge, "trailing" or "boundary". The first corresponds for instance to an occurence at 17 for someone with an interval (17, 35] who is not at risk just before time 17. A tdc at time 17 will affect this interval but not an event. Symmetrically an event occurence at 35 would count in the (17,35] interval, but a tdc would not. The last case is where the main data set has touching intervals for a subject, e.g. (17, 28] and (28,35] and a new occurence lands at the join. Events will go to the earlier interval and counts to the latter one.
It is wise to look at attr(data, 'tcount') after each step of a data set build to avoid surprises.
These extra attributes are ephemeral, and will be discarded if the dataframe is modified in any way. This is intentional.

Details

The program is usually run in multiple passes, the first of which defines the basic structure, and subsequent ones that add new variables to that structure. For a more complete explanation of how this routine works refer to the vignette on time-dependent variables.

There are 4 types of optional arguments: a time dependent covariate (tdc), cumulative count (cumtdc), event (event) or cumulative event (cumevent). Time dependent covariates change their values before an event, events are outcomes.

newname = tdc(y, x)

{A new time dependent covariate variable will created. The argument y is assumed to be on the scale of the start and end time, and each instance decribes the occurent of a "condition" at that time. The second argument x is optional. In the case where x is missing the count variable starts at 0 for each subject and becomes 1 at the time of the event; if x is present the count is set to the value of x. If a given subject has multiple rows of data with the same time value the sum of those rows will be assigned.

newname = cumtdc(y,x){Similar to tdc, except that the event count is accumulated over time for each subject.}

newname = event(y,x){Mark an event at time y. In the ususal case that x is missing, the new 0/1 variable will be similar to the 0/1 status variable of a survival time, and that is in fact how it will normally be used. For multiple types of endpoints the x argument can be used encode the type of event. } newname = cumevent(y,x){Cumulative events}. }

Examples

Run this code

# The data set jasa contains the famous Stanford Heart Transplant data
#  set, as it appeared in Crowley and Hu, JASA 72:27-36, 1971.
# Two special cases need to be dealt with:
#  subject 15 died on day 0 which leads to an illegal (0,0] interval,
#     make them die on day 0.5 instead
#  subject 38 dies on the day of transplant, make tx happen "earlier in
#     the day" (before death) by subtracting .1 from their transplant day
#
tdata <- jasa[, -(1:4)]  #leave off the dates, temporary data set
tdata$futime <- pmax(.5, tdata$futime)  # the death on day 0
indx <- with(tdata, which(wait.time == futime))
tdata$wait.time[indx] <- tdata$wait.time[indx] - .5  #the tied transplant
sdata <- tmerge(tdata, tdata, id=1:nrow(tdata), 
                death = event(futime, fustat), 
                trans = tdc(wait.time))
attr(sdata, "tcount")
# Shows two subjects transplanted on the day of entry, the "front edge" of
#  their follow-up interval

fit <- coxph(Surv(tstart, tstop, death) ~ trans + age, data=sdata)