A fast and general method for reshaping standard longitudinal data into a new
structure called augmented'. This format is suitable under a multi-state
framework using the msm
package.
augment(
data,
data_key,
n_events,
pattern,
state = list("IN", "OUT", "DEAD"),
t_start,
t_end,
t_cens,
t_death,
t_augmented,
more_status,
check_NA = FALSE,
convert = FALSE,
verbose = TRUE
)
A data.table
or data.frame
object in longitudinal
format where each row represents an observation in which the exact starting
and ending time of the process are known and recorded. If data
is a
data.frame
, then augment
internally casts it to a data.table
.
A keying variable which augment
uses to define a key
for data
. This represents the subject ID (see
setkey
).
An integer variable indicating the progressive (monotonic)
event number of a given ID. augment
always checks whether
n_events
is monotonic increasing within the provided data_key
and stops the execution in case the check fails (see 'Details').
If missing, augment
fastly creates a variable named "n_events"
.
Either an integer, a factor or a character with 2 or 3 unique
values which provides the ID status at the end of the study. pattern
has a predefined structure. When 2 values are detected, they must be in the
format: 0 = "alive", 1 = "dead". When 3 values are detected, then the format
must be: 0 = "alive", 1 = "dead during a transition", 2 = "dead after a
transition has ended" (see 'Details').
A list of three and exactly three possible states which a
subject can reach. state
has a predefined structure as follows:
IN, OUT, DEAD (see 'Details').
The starting time of an observation. It can be passed as date, integer, or numeric format.
The ending time of an observation. It can be passed as date, integer, or numeric format.
The censoring time of the study. This is the date until each ID is observed, if still active in the cohort.
The exact death time of a subject ID. If t_death
is
missing, t_cens
is assumed to contain both censoring and death times
and a warning is raised.
A variable indicating the name of the new time variable
of the process in the augmented format. If t_augmented
is missing,
then the default name 'augmented' is assumed and the corresponding new
variable is added to data
. t_augmented
is cast to integer
or to numeric depending whether t_start
is a date or a difftime,
respectively. The suffix '_int' or '_num' is pasted to t_augmented
and a new variable is computed accordingly.
This is done because msm
can't correctly deal with date
or difftime variables. Both variables are positioned before t_start
.
A variable which marks further transitions beside the
default ones given by state
. more_status
can be a factor or a
character (see 'Details'). If missing, augment
ignores it.
If TRUE
, then arguments data_key
,
n_events
, pattern
, t_start
and t_end
are looked
up for any missing data and if the function finds any, it stops with error.
Default is FALSE
because augment
is not intended for
running consistency checks, beside what is mandatory, and because the
procedure is computationally onerous and could cause memory overhead for
very large datasets. Argument more_status
is the only one for which
augment
always checks for the presence of missing data and, again,
if it finds any it just stops with error.
If TRUE
, then the returned object is automatically
converted to the class data.frame
. This is done in place and comes
at very low cost both from running time and memory consumption
(see setDF
).
If FALSE
, all information produced by print
,
cat
and message
are suppressed. Default is TRUE
.
An augmented format dataset of class data.table
, or
data.frame
when convert
is TRUE
, where each row
represents a specific transition for a given subject. augment
returns
them after some important variables have been computed:
augmented
The new timing variable for the process when looking
at transitions. If t_augmented
is missing, then augment
creates
augmented by default. augmented. The function looks directly
to t_start
and t_end
to build it and thus it inherits their class.
In particular, if t_start
is a date format, then augment
computes a new variable cast as integer and names it augmented_int.
If t_start
is a difftime format, then augment
computes a new
variable cast as a numeric and names it augmented_num.
status
A status flag which contains the states as specified
in state
. augment
automatically checks whether argument
pattern
has 2 or 3 unique values and computes the correct structure
of a given subject as reported in the vignette. The variable is cast as
character.
status_num
The corresponding integer version of status.
n_status
A mix of status
and n_events
cast as
character. This becomes useful when a multi-state model on the progression
of the process needs to be implemented.
If more_status is passed, then augment computes some more variables. They mimic the meaning of status, status_num, and n_status but they account for the more complex structure defined. They are: status_exp, status_exp_num, and n_status_exp.
In order to get the data processed, a monotonic increasing process
needs to be ensured. In the first place, augment
checks this both in
case n_events
is missing or not. The data are efficiently ordered through
setkey
function with data_key
as the primary
key and t_start
as the secondary key. In the second place, it checks
the monotonicity of n_events
and if it fails, it stops with error and
returns the subjects given by data_key
for whom the condition is not
met. If n_events
is missing, then augment
internally computes
the progression number with the name n_events and runs the same
procedure.
Attention needs to be payed to argument pattern
. Integer values can
be 0 and 1 if only two status are defined and they must correspond to the
status 'alive' and 'dead'. If three values are defined, then they must be 0,
1 and 2 if pattern
is an integer, or 'alive', 'dead inside a
transition' and dead outside a transition' if pattern
is either a
character or a factor. The order matters: it is not possible to specify
0 as 'dead' for instance.
When passing a list of states, the order is important so that the first element must be the state corresponding to the starting time (i.e. 'IN', inside the hospital), the second element must correspond to the ending time (i.e. 'OUT', outside the hospital), and the third state is the absorbing state (i.e. 'DEAD').
more_status
allows to manage multiple transitions beside what already
specified in state
. In particular, if the corresponding observation
is a standard admission which adds no other information than what is inside
state
, then more_status
must be set to 'df' which stands for
'Default' (see 'Examples' or run ?hosp and look at the variable 'rehab_it').
In general, it is always a good practice to fully specify the transition
with a bunch of self-explanatory characters in order to quickly understand
which is the current transition.
Jackson, C.H. (2011). Multi-State Models for Panel Data: The msm Package for R. Journal of Statistical Software, 38(8), 1-29. URL https://www.jstatsoft.org/v38/i08/.
M. Dowle, A. Srinivasan, T. Short, S. Lianoglou with contributions from R. Saporta and E. Antonyan (2016): data.table: Extension of data.frame. R package version 1.9.6 URL https://github.com/Rdatatable/data.table/wiki
# NOT RUN {
# loading data
data( hosp )
# 1.
# augmenting hosp
hosp_augmented = augment( data = hosp, data_key = subj, n_events = adm_number,
pattern = label_3, t_start = dateIN, t_end = dateOUT,
t_cens = dateCENS )
# 2.
# augmenting hosp by passing more information regarding transitions
# with argument more_status
hosp_augmented_more = augment( data = hosp, data_key = subj, n_events = adm_number,
pattern = label_3, t_start = dateIN, t_end = dateOUT,
t_cens = dateCENS, more_status = rehab_it )
# 3.
# augmenting hosp and returning a data.frame
hosp_augmented = augment( data = hosp, data_key = subj, n_events = adm_number,
pattern = label_3, t_start = dateIN, t_end = dateOUT,
t_cens = dateCENS, convert = TRUE )
class( hosp_augmented )
# }
Run the code above in your browser using DataLab