Learn R Programming

⚠️There's a newer version (4.0.9) of this package.Take me there.

icd

Comorbidities from ICD-9 and ICD-10 codes, manipulation and validation

Introduction

Calculate comorbidities, Charlson and van Walraven scores, perform fast and accurate validation, conversion, manipulation, filtering and comparison of ICD-9 and ICD-10 codes. This package enables a work flow from raw lists of ICD codes in hospital databases to comorbidities. ICD-9 and ICD-10 comorbidity mappings from Quan (Deyo and Elixhauser versions), Elixhauser and AHRQ included. Common ambiguities and code formats are handled.

icd is used by many researchers around the world who work in public health, epidemiology, clinical research, nutrition, journalism, health administration and more. I’m grateful for contact from people in these fields for their feedback and code contributions, and I’m pleased to say that icd has been used in works like the Pulitzer finalist work on maternal death by ProPublica.

Features

  • find comorbidities of patients based on ICD-9 or ICD-10 codes, e.g. Cancer, Heart Disease
    • several standard mappings of ICD codes to comorbidities are included (Quan, Deyo, Elixhauser, AHRQ, PCCC)
    • very fast assignment of ICD codes to comorbidities (using novel matrix multiplication algorithm and C++ internally)
  • use your existing data format, minimizing requirements for pre-processing
  • summarize groups of ICD codes in natural language
  • Charlson and Van Walraven score calculations
  • Hierarchical Condition Codes (HCC) from CMS
  • Clinical Classifcations Software (CCS) comorbidities from AHRQ
  • Pediatric Complex Chronic Condition comorbidities
  • AHRQ ICD-10 procedure code classification
  • annual revisions of ICD-9-CM and ICD-10-CM
  • correct conversion between different representations of ICD codes, with and without a decimal points, leading and trailing characters (this is not trivial for ICD-9-CM). ICD-9 to ICD-10 cross-walk is not yet implemented
  • comprehensive test suite to increase confidence in accurate processing of ICD codes
  • all internal ICD and comorbidity data is extracted directly from public data or code, allowing end-to-end reproducibility
  • used, tested and benchmarked against other comorbidity calculators on hardware from laptops to big servers

Examples

See also the vignettes and examples embedded in the help for each function for more. Here’s a taste:

# install.packages("icd")
library(icd)

# Typical diagnostic code data, with many-to-many relationship
patient_data
#>   visit_id  icd9
#> 1     1000 40201
#> 2     1000  2258
#> 3     1000  7208
#> 4     1000 25001
#> 5     1001 34400
#> 6     1001  4011
#> 7     1002  4011
#> 8     1000  <NA>

# get comorbidities using Quan's application of Deyo's Charlson comorbidity groups
comorbid_charlson(patient_data)
#>         MI   CHF   PVD Stroke Dementia Pulmonary Rheumatic   PUD LiverMild
#> 1000 FALSE  TRUE FALSE  FALSE    FALSE     FALSE     FALSE FALSE     FALSE
#> 1001 FALSE FALSE FALSE  FALSE    FALSE     FALSE     FALSE FALSE     FALSE
#> 1002 FALSE FALSE FALSE  FALSE    FALSE     FALSE     FALSE FALSE     FALSE
#>         DM  DMcx Paralysis Renal Cancer LiverSevere  Mets   HIV
#> 1000  TRUE FALSE     FALSE FALSE  FALSE       FALSE FALSE FALSE
#> 1001 FALSE FALSE      TRUE FALSE  FALSE       FALSE FALSE FALSE
#> 1002 FALSE FALSE     FALSE FALSE  FALSE       FALSE FALSE FALSE

# or go straight to the Charlson scores:
charlson(patient_data)
#> 1000 1001 1002 
#>    2    2    0

# for more examples, see this and other vignettes
vignette("introduction", package = "icd")

Relevance

ICD-9 codes are still in heavy use around the world, particularly in the USA where the ICD-9-CM (Clinical Modification) was in widespread use until the end of 2015. ICD-10 has been used worldwide for reporting cause of death for more than a decade, and ICD-11 is due to be released in 2018. ICD-10-CM is now the primary coding scheme for US hospital admission and discharge diagnoses used for regulatory purposes and billing. A vast amount of electronic patient data is recorded with ICD-9 codes of some kind: this package enables their use in R alongside ICD-10.

Comorbidities

A common requirement for medical research involving patients is determining new or existing comorbidities. This is often reported in Table 1 of research papers to demonstrate the similarity or differences of groups of patients. This package is focussed on fast and accurate generation of this comorbidity information from raw lists of ICD-9 codes.

ICD-9 codes

ICD-9 codes are not numbers, and great care is needed when matching individual codes and ranges of codes. It is easy to make mistakes, hence the need for this package. ICD-9 codes can be presented in short 5 character format, or decimal format, with a decimal place separating the code into two groups. There are also codes beginning with V and E which have different validation rules. Zeroes after a decimal place are meaningful, so numeric ICD-9 codes cannot be used in most cases. In addition, most clinical databases contain invalid codes, and even decimal and non-decimal format codes in different places. This package primarily deals with ICD-9-CM (Clinical Modification) codes, but should be applicable or easily extendible to the original WHO ICD-9 system.

ICD-10 codes

ICD-10 has a somewhat simpler format, with consistent use of a letter, then two alphanumeric characters. However, especially for ICD-10-CM, there are a multitude of qualifiers, e.g. specifying recurrence, laterality, which vastly increase the number of possible codes. This package recognizes validity of codes by syntax alone, or whether the codes appear in a canonical list. The current ICD-10-CM master list is the 2016 set. There is no capability of converting between ICD-9 and ICD-10, but comorbidities can be generated from older ICD-9 codes and newer ICD-10 codes in parallel, and the comorbidities can then be compared.

How to get help

Look at the help files for details and examples of almost every function in this package. There are several vignettes showing the main features. Many users have emailed me directly for help, and I’ll do what I can, but it is often better to examine or add to the list of issues so we can help each other. Advanced users may look at the source code, particularly the extensive test suite which exercises all the key functions.

?comorbid
?comorbid_hcc
?explain_code
?is_valid

# first show the list
vignette(package = "icd")
vignette("pccc", package = "icd")

Note that reformatting from wide to long and back is not as straightforward as using the various Hadley Wickham tools for doing this: knowing the more detailed structure of the data let’s us do this better for the case of dealing with ICD codes.

Development version

The latest version is available in github icd, and can be installed with:

    install.packages("devtools")
    devtools::install_github("jackwasey/icd")

Contributing and Building

A substantial amount of code has now been contributed to the package. Contributions of any kind to icd are very welcome. See the [GitHub issues page]](https://github.com/jackwasey/icd/issues) to see open issues and feature requests. Documentation, vignettes and examples are very welcome, especially if accompanied by some real-world data.

To build icd, Rcpp must be compiled from source. This happens automatically on Linux, but on Mac and Windows, the following may sometimes be required, especially after upgrading R itself. This is a limitation of the R build system.

install.packages("Rcpp", type = "source")

Copy Link

Version

Monthly Downloads

169

Version

3.3

License

GPL-3

Maintainer

Last Published

November 18th, 2018

Functions in icd (3.3)

all_identical

allow microbenchmark to compare multiple results
apply_hier

Apply hierarchy and choose naming for each comorbidity map
classes_ordered

prefer an order of classes
combine

combine ICD codes
count_codes

Count ICD codes or comorbidities for each patient
convert

Convert ICD9 codes between formats and structures.
attr_decimal_diag

Set ICD short-form diagnosis code attribute
attr_short_diag

Set short diagnosis flag in C++
categorize_simple

Categorize codes according to a mapping
comorbid_df_to_mat

convert comorbidity matrix to data frame
comorbid_hcc

Get Hierarchical Condition Codes (HCC)
chapter_to_desc_range

Parse a (sub)chapter text description with parenthesised range
charlson

Calculate Charlson Comorbidity Index (Charlson Score)
comorbid_hcc_worker

apply HCC rules to either ICD-9 or ICD-10 codes
comorbid_pccc_dx

Calculate pediatric complex chronic conditions (PCCC) comorbidities
condense

Condense ICD-9 code by replacing complete families with parent codes
expand_minor

expand decimal part of ICD-9 code to cover all possible sub-codes
as.decimal_diag

Get or set whether ICD codes have have an attribute indicating 'short' or 'decimal' format
expect_chap_equal

expect named sub-chapter has a given range, case insensitive
expand_range

take two ICD-9 codes and expand range to include all child codes
expect_chap_present

expect that a chapter with given title exists, case-insensitive
comorbid_mat_to_df

convert comorbidity data frame from matrix
factor_nosort_rcpp_worker

Fast Factor Generation
cr

sequence columns of comorbidities
short_to_parts.icd10

Convert decimal ICD codes to component parts
fetch_icd10cm_all

Fetch ICD-10-CM data from the CMS web site
filter_poa

Filters data frame based on present-on-arrival flag
as_char_no_warn

convert to character vector without warning
children

Get children of ICD codes
children_defined

defined children of ICD codes
count_codes_wide

Count ICD codes given in wide format
count_comorbid

Count number of comorbidities per patient
charlson_from_comorbid

Calculate Charlson scores from precomputed Charlson comorbidities
do_extra_tests

Set system environment to do extra tests
env_to_vec_flip

return a new environment with names and values swapped
expand_range.icd10cm

Expand range of ICD-10 codes returning only defined codes in ICD-10-CM
fastIntToStringRcpp

Convert integers to strings as quickly as possible
generate_maps_pccc

Generate PCCC data
explain_table

Explain ICD-9 and ICD-10 codes in English from decimal (123.45 style), Tabulates the decimal format alongside converted non-decimal format.
comorbid

Find comorbidities from ICD-9 codes.
explain_table_worker

generate table of ICD code explanations
comorbidMatMulSimple

Comorbidity calculation as a matrix multiplication
condense_explain_table

condense explain_table output down to major codes
condense_explain_table_worker

generate condensed code and condensed number columns
generate_random_short_icd10cm_bill

generate random ICD-9 codes
expand_range_major

Expand major codes to range
decimal_to_short

Convert Decimal format ICD codes to short format
filter_valid

Filter ICD codes by validity.
get_defined

Select only defined ICD codes
get_non_ASCII

mimic the R CMD check test
get_raw_data_dir

Get the raw data directory
diff_comorbid

show the difference between two comorbidity mappings
expect_equal_no_icd

expect equal, ignoring any ICD classes
generate_sysdata

Generate sysdata.rda
explain_code

Explain ICD-9 and ICD-10 codes in English
generate_icd10_sources

Generate list of data source information for ICD-10-CM diagnosis and procedure codes
fixSubchapterNa

Fix NA sub-chapters in RTF parsing
get_billable

Get billable ICD codes
%eine%

in/match equivalent for two Environment arguments
get_icd_name

get the name of a data.frame column which is most likely to contain the ICD codes
icd10_generate_map_quan_elix

generate ICD-10 Quan Elixhauser mapping
generate_random_short_icd9

generate random ICD-9 codes
icd10_map_ahrq_pcs

AHRQ ICD-10-PCS categories
get_valid

invalid subset of decimal or short_code ICD-9 codes
guess_pair_version

Guess the ICD version (9 or 10) from a pair of codes
generate_spelling

Generate spelling exceptions
icd10_fetch_ahrq_ccs

Download AHRQ CCS ICD-10 definitions
icd9_chapters

ICD-9 chapters
icd9_chapters_to_map

convert the chapter headings to lists of codes
icd9_extract_alpha_numeric

extract alphabetic, and numeric part of ICD-9 code prefix
icd10_generate_map_quan_deyo

Generate Quan mapping for Charlson categories of ICD-10 codes
generate_icd_chapters

Generate ICD-9 and ICD-10 Chapter names and number ranges, transcribed from the official definitions from WHO and extended by US CMS.
icd10cm2016

ICD-10-CM
get_visit_name

Get or guess the name of the visit ID column
guess_short

Guess whether codes are short_code or decimal_code
icd9_is_n

Do ICD-9 codes belong to numeric, V or E sub-types?
icd9_fetch_ahrq_ccs

Download AHRQ CCS ICD-9 definitions
guess_version

Guess version of ICD codes
guess_version_update

Guess version of ICD and update class
icd9_is_n_cpp

Do elements of vector begin with V, E (or any other character)?
icd9cm_billable

list of annual versions of billable leaf nodes of ICD-9-CM
get_invalid

Get invalid ICD codes
icd9cm_generate_chapters_hierarchy

Generate ICD-9-CM hierarchy
icd-package

icd: Comorbidity Calculations and Tools for ICD-9 and ICD-10 Codes
is_major

Check whether a code is major
icd10cm_extract_sub_chapters

Get sub-chapters from the 2016 XML for ICD-10-CM
icd10cm_get_all_defined

get all ICD-10-CM codes
icd10_chapters

ICD-10 chapters
is_valid

Check whether ICD-9 codes are syntactically valid
icd9_drop_leading_zeroes

drop zero padding from decimal ICD-9 code.
icd9AddLeadingZeroesMajorSingle

Simpler add leading zeroes without converting to parts and back
icd9_generate_map_elix

Generate Elixhauser comorbidities
get_major.icd9

Get major part of an ICD code
icd10_parse_ahrq_ccs

parse AHRQ CCS for mapping - ICD10
icd10_parse_cc

Import the ICD10 to CC crosswalks
icd10_pcs

ICD-10-CM Procedure Codes
icd9_generate_map_quan_elix

Generate Quan's revised Elixhauser comorbidities
icd9_map_single_ccs

Clinical Classifications Software (CCS) for ICD9/10-CM
icd9_expand_range_worker

expand range worker
icd9_map_hcc

Medicare Hierarchical Condition Categories
icd9_order_short

Get order of short-form ICD-9 codes
is_valid.default

Test whether an ICD code is major
logical_to_binary

Encode TRUE as 1, and FALSE as 0 (integers)
icd10_comorbid_reduce

ICD-10 comorbidities by reducing problem size
icd9_map_pccc

Pediatric Complex Chronic Conditions
icd9cm_get_billable

Get billable ICD-9-CM codes
icd10_sub_chapters

ICD-10 sub-chapters
icd9_parse_quan_deyo_sas

parse original SAS code defining Quan's update of Deyo comorbidities.
icd9cm_hierarchy

Latest ICD-9-CM diagnosis codes, in flat data.frame format
rtf_fix_duplicates

fix duplicates detected in RTF parsing
rtf_fix_quirks_2015

fix quirks for 2015 RTF parsing
icd9_sources

ICD-9 and ICD-10 data sources
is_billable

Determine whether codes are billable leaf-nodes
icd9cm_latest_edition

Latest ICD-9-CM edition
rtf_parse_year

parse RTF description of entire ICD-9-CM for a specific year
icd9_generate_sources

generate data for finding source data for ICD-9-CM
is_defined

Check whether ICD-9 codes exist
icd9MajMinToCode

Convert mjr and mnr vectors to single code
print.comorbidity_map

Print a comorbidity map
icd9_get_chapters

get ICD-9 Chapters from vector of ICD-9 codes
poa_choices

Present-on-admission flags
print.icd9

Print ICD codes and comorbidity maps cleanly
icd9_map_ahrq

AHRQ comorbidities
random_string

generate random strings
rtf_parse_fifth_digit_range

parse a row of RTF source data for ranges to apply fifth digit
icd_attr_clean

Remove any attributes set by 'icd'
rtf_parse_lines

parse lines of RTF
parse_leaf_desc_icd9cm_v27

Parse billable codes for ICD-9-CM version 27
icd9_map_elix

Elixhauser comorbidities
icd9_add_leading_zeroes_cpp

Add leading zeroes to incomplete ICD-9 codes
set_icd_class

Construct ICD-9 and ICD-10 data types
rtf_strip

Strip RTF
icd9_fetch_ahrq_sas

get the SAS code from AHRQ
icd9_generate_all_

generate lookup data for each class of ICD-9 code
set_re_globals

Put ICD validation regular expressions in the icd::: name space
strim

Trim leading and trailing white space from a single string
strip

Strip character(s) from character vector
icd9_map_quan_deyo

Quan adaptation of Deyo/Charlson comorbidities
parse_leaf_descriptions_all

Get billable codes from all available years
rtf_fetch_year

Fetch RTF for a given year
icd9_parse_ahrq_ccs

parse AHRQ CCS for mapping
shortcode_icd9

set short_to_decimal attribute
rtf_filter_excludes

exclude some unwanted rows from filtered RTF
simplify_map_lex

Internal function to simplify a comorbidity map by only including codes which are parents, or identical to, a given list of codes.
[[.comorbidity_map

Extract vector of codes from an ICD comorbidity map
sas_parse_assignments

Get assignments from a character string strings
icd9_map_quan_elix

Quan adaptation of Elixhauser comorbidities
save_in_data_dir

Save given variable in package data directory
icd9_parse_cc

Generate ICD to HCC Crosswalks from CMS
subset_icd

extract subset from ICD data
icd9_parse_leaf_desc_ver

Read the ICD-9-CM description data as provided by the Center for Medicaid Services (CMS).
icd_parse_cc_hierarchy

Import CMS HCC Rules
is.icd9

test ICD-related classes
vermont_dx

Hospital discharge data from Vermont
wide_to_long

Convert ICD data from wide to long format
short_to_decimal

Convert ICD codes from short to decimal forms
short_to_parts

Convert short format ICD codes to component parts
swap_names_vals

swap names and values of a vector
icd9_parse_ahrq_sas

parse AHRQ SAS code to get mapping
icd_classes_conflict

Check whether there are any ICD class conflicts
switch_ver_cmb

guess icd-9 or icd-10 or other type, and switch to call the given function
named_list

make a list using input argument names as names
as.icd_long_data

Convert between and identify 'long' and 'wide' patient data formats
names_elix

Comorbidity names
refactor

Refactor by integer matching levels in C++
long_to_wide

Convert ICD data from long to wide format
match_rcpp

Faster match
refactor_worker

Re-generate a factor with new levels, without doing string matching
rtf_fix_unicode

Fix Unicode characters in RTF
rtf_generate_fourth_lookup

generate look-up for four digit codes
rtf_lookup_fourth

apply fourth digit qualifiers
sas_extract_let_strings

Extract quoted or unquoted SAS string definitions
rtf_main_filter

filter RTF for actual ICD-9 codes
sas_format_extract

Extract assignments from a SAS FORMAT definition
sort_icd

Sort short-form ICD-9 codes
str_extract

stringr does this, but here we have a small amount of base R code
test_env

Set up a test environment which also has the internal functions
trim

Trim leading and trailing white space
van_walraven

Calculate van Walraven Elixhauser Score
str_match_all

return all matches for regular expression
str_pair_match

Match pairs of strings to get named vector
vec_to_env_true

create environment from vector
unzip_single

unzip a single file from URL
update_everything

generate all package data
unzip_to_data_raw

Unzip file to raw data directory
uranium_pathology

United States Transuranium & Uranium Registries