prepExpo: Prepare Exposure Data for Aggregation

Description

prepExpo uses a Lexis object of periods of exposure to fill gaps between the periods and overall entry and exit times without accumulating exposure time in periods of no exposure, and splits the result if requested.

Usage

prepExpo(
  lex,
  freezeScales = "work",
  cutScale = "per",
  entry = min(get(cutScale)),
  exit = max(get(cutScale)),
  by = "lex.id",
  breaks = NULL,
  freezeDummy = NULL,
  subset = NULL,
  verbose = FALSE,
  ...
)

Value

Returns a Lexis object that has been split if breaks is specified. The resulting time is also a data.table if options("popEpi.datatable") == TRUE (see: ?popEpi)

Arguments

lex: a Lexis object with ONLY periods of exposure as rows; one or multiple rows per subject allowed
freezeScales: a character vector naming Lexis time scales of exposure which should be frozen in periods where no exposure occurs (in the gap time periods)
cutScale: the Lexis time scale along which the subject-specific ultimate entry and exit times are specified
entry: an expression; the time of entry to follow-up which may be earlier, at, or after the first time of exposure in freezeScales; evaluated separately for each unique combination of by, so e.g. with entry = min(Var1) and by = "lex.id" it sets the lex.id-specific minima of Var1 to be the original times of entry for each lex.id
exit: the same as entry but for the ultimate exit time per unique combination of by
by: a character vector indicating variable names in lex, the unique combinations of which identify separate subjects for which to fill gaps in the records from entry to exit; for novices of {data.table}, this is passed to a data.table's by argument.
breaks: a named list of breaks; e.g. list(work = 0:20,per = 1995:2015); passed on to splitMulti so see that function's help for more details
freezeDummy: a character string; specifies the name for a dummy variable that this function will create and add to output which identifies rows where the freezeScales are frozen and where not (0 implies not frozen, 1 implies frozen); if NULL, no dummy is created
subset: a logical condition to subset data by before computations; e.g. subset = sex == "male"
verbose: logical; if TRUE, the function is chatty and returns some messages and timings during its run.
...: additional arguments passed on to splitMulti

Details

prepExpo is a convenience function for the purpose of eventually aggregating person-time and events in categories of not only normally progressing Lexis time scales but also some time scales which should not progress sometimes. For example a person may work at a production facility only intermittently, meaning exposure time (to work-related substances for example) should not progress outside of periods of work. This allows for e.g. a correct aggregation of person-time and events by categories of cumulative time of exposure.

Given a Lexis object containing rows (time lines) where a subject is exposed to something (and NO periods without exposure), fills any gaps between exposure periods for each unique combination of by and the subject-specific "ultimate" entry and exit times, "freezes" the cumulative exposure times in periods of no exposure, and splits data using breaks passed to splitMulti if requested. Results in a (split) Lexis object where freezeScales do not progress in time periods where no exposure was recorded in lex.

This function assumes that entry and exit arguments are the same for each row within a unique combination of variables named in by. E.g. with by = "lex.id" only each lex.id has a unique value for entry and exit at most.

The supplied breaks split the data using splitMulti, with the exception that breaks supplied concerning any frozen time scales ONLY split the rows where the time scales are not frozen. E.g. with freezeScales = "work", breaks = list(work = 0:10, cal = 1995:2010) splits all rows over "cal" but only non-frozen rows over "work".

Only supports frozen time scales that advance and freeze contemporaneously: e.g. it would not currently be possible to take into account the cumulative time working at a facility and the cumulative time doing a single task at the facility, if the two are not exactly the same. On the other hand one might use the same time scale for different exposure types, supply them as separate rows, and identify the different exposures using a dummy variable.