wide_to_long: Convert ICD data from wide to long format

Description

Reshaping data is a common task, and is made easier here by knowing more about the underlying structure of the data. This function wraps the reshape function with specific behavior and checks related to ICD codes. Empty strings and NA values will be dropped, and everything else kept. No validation of the ICD codes is done.

Usage

wide_to_long(x, visit_name = get_visit_name(x), icd_labels = NULL,
  icd_name = "icd_code", icd_regex = c("icd", "diag", "dx_", "dx"))
icd_wide_to_long(...)

Arguments

data.frame in wide format, i.e. one row per patient, and multiple columns containing ICD codes, empty strings or NA.

visit_name

The name of the column in the data frame which contains the patient or visit identifier. Typically this is the visit identifier, since patients come leave and enter hospital with different ICD-9 codes. It is a character vector of length one. If left empty, or NULL, then an attempt is made to guess which field has the ID for the patient encounter (not a patient ID, although this can of course be specified directly). The guesses proceed until a single match is made. Data frames may be wide with many matching fields, so to avoid false positives, anything but a single match is rejected. If there are no successful guesses, and visit_id was not specified, then the first column of the data frame is used.

icd_labels

vector of column names in which codes are found. If NULL, all columns matching the regular expression icd_regex will be included.

icd_name

The name of the column in the data.frame which contains the ICD codes. This is a character vector of length one. If it is NULL, icd9 will attempt to guess the column name, looking for progressively less likely possibilities until it matches a single column. Failing this, it will take the first column in the data frame. Specifying the column using this argument avoids the guesswork.

icd_regex

vector of character strings containing a regular expression to identify ICD-9 diagnosis columns to try (case-insensitive) in order. Default is c("icd", "diag", "dx_", "dx")

...

arguments passed on to other functions

Value

data.frame with visit_name column named the same as input, and a column named by icd.name containing all the non-NA and non-empty codes found in the wide input data.

Deprecated function names

Future versions of icd will drop the icd_ prefix. For example, charlson should be used in favor of icd_charlson. To distinguish icd function calls, consider using the prefix icd:: instead, e.g., icd::charlson. Functions which specifically operate on either ICD-9 or ICD-10 codes or their sub-types will retain the prefix. E.g. icd9_comorbid_ahrq. icd specific classes also retain the prefix, e.g., icd_wide_data.

Examples

Run this code

# NOT RUN {
widedf <- data.frame(visit_name = c("a", "b", "c"),
  icd9_01 = c("441", "4424", "441"),
  icd9_02 = c(NA, "443", NA)
  )
wide_to_long(widedf)
# }

Run the code above in your browser using DataLab