icd9ComorbidShortCpp: find comorbidities from ICD-9 codes.

Description

RcppParallel approach to comorbidity assignment with OpenMP and vector of integers strategy. It is very fast, and most time is now spent setting up the data to be passed in.

This is the main function which extracts co-morbidities from a set of ICD-9 codes. This is when some trivial post-processing of the comorbidity data is done, e.g. renaming to human-friendly field names, and updating fields according to rules. The exact fields from the original mappings can be obtained using applyHierarchy = FALSE, but for comorbidity counting, Charlson Score, etc., the rules should be applied.

For Charlson/Deyo comorbidities, strictly speaking, there is no dropping of more e.g. uncomplicated DM if complicated DM exists, however, this is probaably useful, in general and is essential when calculating the Charlson score.

Usage

icd9ComorbidShortCpp(icd9df, icd9Mapping, visitId, icd9Field, threads = 8L, chunkSize = 256L, ompChunkSize = 1L, aggregate = TRUE)
icd9Comorbid(icd9df, icd9Mapping, visitId = NULL, icd9Field = NULL, isShort = icd9GuessIsShort(icd9df[1:100, icd9Field]), isShortMapping = icd9GuessIsShort(icd9Mapping), return.df = FALSE, ...)
icd9ComorbidShort(...)
icd9ComorbidAhrq(..., abbrevNames = TRUE, applyHierarchy = TRUE)
icd9ComorbidQuanDeyo(..., abbrevNames = TRUE, applyHierarchy = TRUE)
icd9ComorbidQuanElix(..., abbrevNames = TRUE, applyHierarchy = TRUE)
icd9ComorbidElix(..., abbrevNames = TRUE, applyHierarchy = TRUE)
icd9Comorbidities(...)
icd9ComorbiditiesAhrq(...)
icd9ComorbiditiesElixHauser(...)
icd9ComorbiditiesQuanDeyo(...)
icd9ComorbiditiesQuanElixhauser(...)

Arguments

icd9df

data frame containing columns for visitId (which is the feault name), icd9 (default for the icd9 code), and maybe also a POA flag.

icd9Mapping

list (or name of a list if character vector of length one is given as argument) of the comorbidities with each top-level list item containing a vector of decimal ICD9 codes. This is in the form of a list, with the names of the items corresponding to the comorbidities (e.g. "HTN", or "diabetes") and the contents of each list item being a character vector of short-form (no decimal place but ideally zero left-padded) ICD-9 codes. No default: user should prefer to use the derivative functions, e.g. icd9ComorbidAhrq, since these also provide appropriate naming for the fields, and squashing the hierarchy (see applyHierarchy below)

visitId

The name of the column in the data frame which contains the patient or visit identifier. Typically this is the visit identifier, since patients come leave and enter hospital with different ICD-9 codes. It is a character vector of length one. If left empty, or NULL, then an attempt is made to guess which field has the ID for the patient encounter (not a patient ID, although this can of course be specified directly). The guesses proceed until a single match is made. Data frames may be wide with many matching fields, so to avoid false positives, anything but a single match is rejected. If there are no successful guesses, and visitId was not specified, then the first column of the data frame is used.

icd9Field

The column in the data frame which contains the ICD codes. This is a character vector of length one. If it is NULL, icd9 will attempt to guess the column name, looking for progressively less likely possibilities until it matche a single column. Failing this, it will take the first column in the data frame. Specifying the column using this argument avoids the guesswork.

aggregate

single logical value, if /codeTRUE, then take (possible much) more time to aggregate out-of-sequence visit IDs in the icd9df data.frame. If this is FALSE, then each contiguous group of visit IDs will result in a row of comorbidities in the output data. If you know your visitIds are possible disordered, then use TRUE.

isShort

single logical value which determines whether the ICD-9 code provided is in short (TRUE) or decimal (FALSE) form. Where reasonable, this is guessed from the input data.

isShortMapping

Same as isShort, but applied to icd9Mapping instead of icd9df. All the codes in a mapping should be of the same type, i.e. short or decimal.

...

arguments passed to the corresponding function from the alias. E.g. all the arguments passed to icd9ComorbiditiesAhrq are passed on to icd9ComorbidAhrq

abbrevNames

single locical value that defaults to TRUE, in which case the ishorter human-readable names stored in e.g. ahrqComorbidNamesAbbrev are applied to the data frame column names.

applyHierarchy

single logical value that defaults to TRUE, in which case the hierarchy defined for the mapping is applied. E.g. in Elixhauser, you can't have uncomplicated and complicated diabetes both flagged.

Details

There is a change in behavior from previous versions. The visitId column is (implicitly) sorted by using std::set container. Previously, the visitId output order was whatever R's aggregate produced.

The threading of the C++ can be controlled using e.g. option(icd9.threads = 4). If it is not set, the number of cores in the machine is used.

Examples

Run this code

  pts <- data.frame(visitId = c("2", "1", "2", "3", "3"),
                   icd9 = c("39891", "40110", "09322", "41514", "39891"))
   icd9ComorbidShort(pts, ahrqComorbid) # visitId is now sorted

Run the code above in your browser using DataLab