icd9WideToLong: convert ICD data from wide to long format

Description

This is different enough to dcast in reshape2 that it needs writing again specifically for ICD codes. This function packages the core reshape function. Empty strings and NA values will be dropped, and everything else kept. No validation of the ICD codes is done.

Usage

icd9WideToLong(x, visitId = NULL, icdLabels = NULL, icdName = "icdCode", icdRegex = c("icd", "diag", "dx_", "dx"), verbose = FALSE)

Arguments

data.frame in wide format, i.e. one row per patient, and multiple columns containing ICD codes, empty strings or NA.

visitId

The name of the column in the data frame which contains the patient or visit identifier. Typically this is the visit identifier, since patients come leave and enter hospital with different ICD-9 codes. It is a character vector of length one. If left empty, or NULL, then an attempt is made to guess which field has the ID for the patient encounter (not a patient ID, although this can of course be specified directly). The guesses proceed until a single match is made. Data frames may be wide with many matching fields, so to avoid false positives, anything but a single match is rejected. If there are no successful guesses, and visitId was not specified, then the first column of the data frame is used.

icdLabels

vector of column names in which codes are found. If NULL, all columns matching icd or ICD will be included.

icdName

character vector length one containing the new column name for the ICD codes, defaults to "icdCode"

icdRegex

vector of character strings containg a regex to identify ICD-9 diagnosis columns to try (case-insensitive) in order. Default is c("icd", "diag", "dx_", "dx")

verbose

single logical value, defaults to FALSE in most functions.

Value

data frame with visitId column named the same as input, and a column named by icd.name containing all the non-NA and non-empty codes found in the wide input data.

Examples

Run this code

  widedf <- data.frame(visitId = c("a", "b", "c"),
    icd9_01 = c("441", "4424", "441"),
    icd9_02 = c(NA, "443", NA))
  icd9WideToLong(widedf)

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples