long_to_wide: Convert ICD data from long to wide format

Description

This is more complicated than reshape or reshape2::dcast allows. This is a reasonably simple solution using built-in functions.

Usage

long_to_wide(x, visit_name = get_visit_name(x),
  icd_name = get_icd_name(x), prefix = "icd_", min_width = 0,
  aggr = TRUE, return_df = FALSE)
icd_long_to_wide(...)

Arguments

data.frame of long-form data, one column for visit_name and one for ICD code

visit_name

The name of the column in the data frame which contains the patient or visit identifier. Typically this is the visit identifier, since patients come leave and enter hospital with different ICD-9 codes. It is a character vector of length one. If left empty, or NULL, then an attempt is made to guess which field has the ID for the patient encounter (not a patient ID, although this can of course be specified directly). The guesses proceed until a single match is made. Data frames may be wide with many matching fields, so to avoid false positives, anything but a single match is rejected. If there are no successful guesses, and visit_id was not specified, then the first column of the data frame is used.

icd_name

The name of the column in the data.frame which contains the ICD codes. This is a character vector of length one. If it is NULL, icd9 will attempt to guess the column name, looking for progressively less likely possibilities until it matches a single column. Failing this, it will take the first column in the data frame. Specifying the column using this argument avoids the guesswork.

prefix

character, default icd_ to prefix new columns

min_width,

single integer, if specified, writes out this many columns even if no patients have that many codes. Must be greater than or equal to the maximum number of codes per patient.

aggr

single logical value, if TRUE (the default) will take more time to find out-of-order visit_names, and combine all the codes for each unique visit_name. If FALSE, then out-of-order visit_names will result in a row in the output data per contiguous block of identical visit_names.

return_df

single logical value, if TRUE, return a data frame with a field for the visit_name. This may be more convenient, but the default of FALSE gives the more natural return data of a matrix with row names being the visit IDs from visit_names.

...

arguments passed on to other functions

Deprecated function names

Future versions of icd will drop the icd_ prefix. For example, charlson should be used in favor of icd_charlson. To distinguish icd function calls, consider using the prefix icd:: instead, e.g., icd::charlson. Functions which specifically operate on either ICD-9 or ICD-10 codes or their sub-types will retain the prefix. E.g. icd9_comorbid_ahrq. icd specific classes also retain the prefix, e.g., icd_wide_data.

Examples

Run this code

# NOT RUN {
  longdf <- data.frame(visit_name = c("a", "b", "b", "c"),
    icd9 = c("441", "4424", "443", "441"))
  long_to_wide(longdf)
  long_to_wide(longdf, prefix = "ICD10_")
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Deprecated function names

See Also

Examples