Learn R Programming

⚠️There's a newer version (4.0.9) of this package.Take me there.

icd

ICD-9 and ICD-10 comorbidities, manipulation and validation

Features

  • find comorbidities of patients based on admission or discharge ICD-9 or ICD-10 codes, e.g. Cancer, Heart Disease
    • several standard mappings of ICD codes to comorbidities are included (Quan, Deyo, Elixhauser, AHRQ)
    • very fast assignment of ICD codes to comorbidities (using matrix multiplication with C and C++ internally)
  • Charlson and Van Walraven score calculations
  • Hierarchical Condition Codes (HCC) from CMS
  • Clinical Classifcations Software (CCS) comorbidities from AHRQ
  • validation of ICD codes from different annual revisions of ICD-9-CM and ICD-10-CM
  • summarizing ICD codes into groups, and to human-readable descriptions
  • correct conversion between different representations of ICD codes, with and without a decimal points, leading and trailing characters (this is not trivial for ICD-9-CM). ICD-9 to ICD-10 cross-walk is not yet implemented
  • comprehensive test suite to increase confidence in accurate processing of ICD codes
  • all internal ICD and comorbidity data is extracted directly from publically available data or code, increasing confidence in the results

Install

install.packages("icd")

Introduction

Calculate comorbidities, Charlson scores, perform fast and accurate validation, conversion, manipulation, filtering and comparison of ICD-9 and ICD-10 codes. This package enables a work flow from raw lists of ICD codes in hospital databases to comorbidities. ICD-9 and ICD-10 comorbidity mappings from Quan (Deyo and Elixhauser versions), Elixhauser and AHRQ included. Common ambiguities and code formats are handled.

Relevance

ICD-9 codes are still in heavy use around the world, particularly in the USA where the ICD-9-CM (Clinical Modification) was in widespread use until the end of 2015. ICD-10 has been used worldwide for reporting cause of death for more than a decade. ICD-10-CM is now the primary coding scheme for US hospital admission and discharge diagnoses used for regulatory purposes and billing. A vast amount of patient data is recorded with ICD-9 codes of some kind: this package enables their use in R alongside ICD-10.

Comorbidities

A common requirement for medical research involving patients is determining new or existing comorbidities. This is often reported in Table 1 of research papers to demonstrate the similarity or differences of groups of patients. This package is focussed on fast and accurate generation of this comorbidity information from raw lists of ICD-9 codes.

ICD-9 codes

ICD-9 codes are not numbers, and great care is needed when matching individual codes and ranges of codes. It is easy to make mistakes, hence the need for this package. ICD-9 codes can be presented in short 5 character format, or decimal format, with a decimal place separating the code into two groups. There are also codes beginning with V and E which have different validation rules. Zeroes after a decimal place are meaningful, so numeric ICD-9 codes cannot be used in most cases. In addition, most clinical databases contain invalid codes, and even decimal and non-decimal format codes in different places. This package primarily deals with ICD-9-CM (Clinical Modification) codes, but should be applicable or easily extendible to the original WHO ICD-9 system.

ICD-10 codes

ICD-10 has a somewhat simpler format, with consistent use of a letter, then two alphanumeric characters. However, especially for ICD-10-CM, there are a multitude of qualifiers, e.g. specifying recurrence, laterality, which vastly increase the number of possible codes. This package recognizes validity of codes by syntax alone, or whether the codes appear in a canonical list. The current ICD-10-CM master list is the 2016 set. There is no capability of converting between ICD-9 and ICD-10, but comorbidities can be generated from older ICD-9 codes and newer ICD-10 codes in parallel, and the comorbidities can then be compared.

Examples

See also the vignettes and examples embedded in the help for each function for more. Here’s a taste:

patient_data
#>   visit_id  icd9  poa
#> 1     1000 40201    Y
#> 2     1000  2258 <NA>
#> 3     1000  7208    N
#> 4     1000 25001    Y
#> 5     1001 34400    X
#> 6     1001  4011    Y
#> 7     1002  4011    E

# get comorbidities using Quan's application of Deyo's Charlson comorbidity groups
comorbid_charlson(patient_data)
#>         MI   CHF   PVD Stroke Dementia Pulmonary Rheumatic   PUD LiverMild
#> 1000 FALSE  TRUE FALSE  FALSE    FALSE     FALSE     FALSE FALSE     FALSE
#> 1001 FALSE FALSE FALSE  FALSE    FALSE     FALSE     FALSE FALSE     FALSE
#> 1002 FALSE FALSE FALSE  FALSE    FALSE     FALSE     FALSE FALSE     FALSE
#>         DM  DMcx Paralysis Renal Cancer LiverSevere  Mets   HIV
#> 1000  TRUE FALSE     FALSE FALSE  FALSE       FALSE FALSE FALSE
#> 1001 FALSE FALSE      TRUE FALSE  FALSE       FALSE FALSE FALSE
#> 1002 FALSE FALSE     FALSE FALSE  FALSE       FALSE FALSE FALSE

# or go straight to the Charlson scores:
charlson(patient_data)
#> 1000 1001 1002 
#>    2    2    0

# get comorbidities based on present-on-arrival diagnoses, use magrittr to flow the data
patient_data %>% filter_poa %>% comorbid_elix
#>        CHF Arrhythmia Valvular  PHTN   PVD   HTN Paralysis NeuroOther
#> 1000 FALSE      FALSE    FALSE FALSE FALSE FALSE     FALSE      FALSE
#> 1001 FALSE      FALSE    FALSE FALSE FALSE  TRUE     FALSE      FALSE
#>      Pulmonary    DM  DMcx Hypothyroid Renal Liver   PUD   HIV Lymphoma
#> 1000     FALSE  TRUE FALSE       FALSE FALSE FALSE FALSE FALSE    FALSE
#> 1001     FALSE FALSE FALSE       FALSE FALSE FALSE FALSE FALSE    FALSE
#>       Mets Tumor Rheumatic Coagulopathy Obesity WeightLoss FluidsLytes
#> 1000 FALSE FALSE     FALSE        FALSE   FALSE      FALSE       FALSE
#> 1001 FALSE FALSE     FALSE        FALSE   FALSE      FALSE       FALSE
#>      BloodLoss Anemia Alcohol Drugs Psychoses Depression
#> 1000     FALSE  FALSE   FALSE FALSE     FALSE      FALSE
#> 1001     FALSE  FALSE   FALSE FALSE     FALSE      FALSE

Look at the help files for details and examples of almost every function in this package.

?comorbid
?comorbid_hcc
?explain
?is_valid

Note that reformatting from wide to long and back is not as straightforward as using the various Hadley Wickham tools for doing this: knowing the more detailed structure of the data let’s us do this better for the case of dealing with ICD codes.

Advanced

Source Data and SAS format files

In the spirit of reproducible research, all the R data files in this package can be recreated from source. The size of the source files makes it cumbersome to include them in the R package available on CRAN. Using the github source, you can pull the original data and SAS format files, and rebuild the data; or use the tools provided by this package to update the data using new source data files, e.g. when ICD-10-CM 2017 is released.

Development version

The latest version is available in github icd, and can be installed with:

    install.packages("devtools")
    devtools::install_github("jackwasey/icd")

The master branch at github should always build and pass all tests and R CMD check, and will be similar or identical to the most recent CRAN release. The CRAN releases are stable milestones. Contributions and bug reports are encouraged and essential for this package to remain current and useful to the many people who have installed it.

Contributing and Building

A substantial amount of code has now been contributed to the package. Contributions of any kind to icd are very welcome. See the [GitHub issues page]](https://github.com/jackwasey/icd/issues) to see jobs and feature requests. Documentation, vignettes and examples are very welcome, especially if accompanied by some real-world data.

To build icd, Rcpp must be compiled from source. This happens automatically on Linux, but on Mac and Windows, the following is required: install.packages("Rcpp", type="source") to avoid build errors.

Copy Link

Version

Monthly Downloads

135

Version

3.1.2

License

GPL-3

Maintainer

Jack O Wasey

Last Published

May 9th, 2018

Functions in icd (3.1.2)

comorbid

Find comorbidities from ICD-9 codes.
count_comorbid

Count number of comorbidities per patient
count_codes_wide

Count ICD codes given in wide format
count_codes

Count ICD codes or comorbidities for each patient
condense

Condense ICD-9 code by replacing complete families with parent codes
decimal_to_short

Convert Decimal format ICD codes to short format
condense_explain_table

condense explain_table output down to major codes
short_to_parts.icd10

Convert decimal ICD codes to component parts
expand_range.icd10cm

Expand range of ICD-10 codes returning only defined codes in ICD-10-CM
cr

sequence columns of comorbidities
expand_range

take two ICD-9 codes and expand range to include all child codes
env_to_vec_flip

return a new environment with names and values swapped
expect_equal_no_icd

expect equal, ignoring any ICD classes
explain_table_worker

generate table of ICD code explanations
expand_minor

expand decimal part of ICD-9 code to cover all possible sub-codes
generate_sysdata

Generate sysdata.rda
generate_random_short_icd10cm_bill

generate random ICD-9 codes
convert

Convert ICD9 codes between formats and structures.
expect_chap_present

expect that a chapter with given title exists, case-insensitive
explain

Explain ICD-9 and ICD-10 codes in English
fixSubchapterNa

Fix NA sub-chapters in RTF parsing
condense_explain_table_worker

generate condensed code and condensed number columns
get_defined

Select only defined ICD codes
explain_table

Explain ICD-9 and ICD-10 codes in English from decimal (123.45 style), Tabulates the decimal format alongside converted non-decimal format.
factor_fast

fast factor generation WIP
guess_version

Guess version of ICD codes
get_icd_name

get the name of a data.frame column which is most likely to contain the ICD codes
icd10_comorbid_reduce

ICD-10 comorbidities by reducing problem size
icd9Comorbid_alt_Taskloop

Simpler comorbidity assignment
icd10_comorbid_parent_search_cpp

Internal function to find ICD-10 parents
guess_short

Guess whether codes are short_code or decimal_code
diff_comorbid

show the difference between two comorbidity mappings
get_billable

Get billable ICD codes
do_extra_tests

Set system environment to do extra tests
icd10_chapters

ICD-10 chapters
generate_random_short_icd9

generate random ICD-9 codes
icd9MajMinToCode_alt_PrePadded

Convert mjr and mnr vectors to single code
filter_poa

Filters data frame based on present-on-arrival flag
expand_range_major

Expand major codes to range
expect_chap_equal

expect named sub-chapter has a given range, case insensitive
generate_spelling

Generate spelling exceptions
filter_valid

Filter ICD codes by validity.
icd10_generate_map_quan_elix

generate ICD-10 Quan Elixhauser mapping
icd10_comorbid_parent_search

find ICD-10 comorbidities by checking parents
icd10_parse_ahrq_ccs

parse AHRQ CCS for mapping - ICD10
icd9MajMinToShort_alt_Std

initialize a std::vector of strings with repeated value of the minor
get_major.icd9

Get major part of an ICD code
get_invalid

Get invalid ICD codes
guess_version_update

Guess version of ICD and update class
icd9RandomShortN

Generate random short-form ICD-9 codes
icd9_add_leading_zeroes_cpp

Add leading zeroes to incomplete ICD-9 codes
icd9_chapters_to_map

convert the chapter headings to lists of codes
factor_nosort

Fast Factor Generation
icd9_chapters

ICD-9 chapters
icd9_drop_leading_zeroes

drop zero padding from decimal ICD-9 code.
icd-package

icd: Tools for Working with ICD-9 and ICD-10 Codes, and Finding Comorbidities
get_non_ASCII

mimic the R CMD check test
icd9_map_elix

Elixhauser comorbidities
get_valid

invalid subset of decimal or short_code ICD-9 codes
icd10cm_extract_sub_chapters

Get sub-chapters from the 2016 XML for ICD-10-CM
icd10cm2016

ICD-10-CM
fastIntToString

Fast convert integer vector to character vector
icd9cm_billable

list of annual versions of billable leaf nodes of ICD-9-CM
get_visit_name

Get or guess the name of the visit ID column
icd9_map_hcc

Medicare Hierarchical Condition Categories
%eine%

in/match equivalent for two Environment arguments
icd9cm_generate_chapters_hierarchy

generate ICD-9-CM hierarchy
get_raw_data_dir

Get the raw data directory
icd9cm_get_billable

Get billable ICD-9-CM codes
guess_pair_version

Guess the ICD version (9 or 10) from a pair of codes
icd9_generate_map_quan_elix

Generate Quan's revised Elixhauser comorbidities
icd9AppendMinors

append minor to major using std
icd9AddLeadingZeroes_alt_ShortSingle

Decompose a 'short' ICD code and insert the leading zeroes as needed.
icd10_fetch_ahrq_ccs

Download ahrq-css-icd10 definition
icd10_sub_chapters

ICD-10 sub-chapters
icd9cm_hierarchy

Latest ICD-9-CM diagnosis codes, in flat data.frame format
icd10_parse_cc

Import the ICD10 to CC crosswalks
icd9_expand_range_worker

expand range worker
icd9_generate_sources

generate data for finding source data for ICD-9-CM
named_list

make a list using input argument names as names
icd10_generate_map_quan_deyo

Generate Quan mapping for Charlson categories of ICD-10 codes
icd9ChildrenShort_alt_11

Find child codes from vector of ICD-9 codes.
icd9_extract_alpha_numeric

extract alphabetic, and numeric part of ICD-9 code prefix
icd9_parse_ahrq_sas

parse AHRQ SAS code to get mapping
icd9_is_n_cpp

Do elements of vector begin with V, E (or any other character)?
icd9_parse_ahrq_ccs

parse AHRQ CCS for mapping
icd10cm_get_all_defined

get all ICD-10-CM codes
names_elix

Comorbidity names
icd9_map_ahrq

AHRQ comorbidities
is_defined

Check whether ICD-9 codes exist
is_major

Check whether a code is major
icd9ChildrenShort_alt_Std

C++ implementation of finding children of short codes
rtf_fix_duplicates

fix duplicates detected in RTF parsing
rtf_filter_excludes

exclude some unwanted rows from filtered RTF
as.icd_long_data

Convert between and identify 'long' and 'wide' patient data formats
icd9_order_short

Get order of short-form ICD-9 codes
logical_to_binary

Encode TRUE as 1, and FALSE as 0 (integers)
sas_format_extract

Extract assignments from a SAS FORMAT definition
icd9AddLeadingZeroesMajorSingle

Simpler add leading zeroes without converting to parts and back
icd9_map_single_ccs

Clinical Classifactions Software (CCS) for ICD9/10-CM
icd9_fetch_ahrq_ccs

Download ahrq-ccs-icd9 definition
icd9_generate_all_

generate lookup data for each class of ICD-9 code
icd9_generate_map_elix

Generate Elixhauser comorbidities
sas_parse_assignments

Get assignments from a character string strings
simplify_map_lex

Internal function to simplify a comorbidity map by only including codes which are parents, or identical to, a given list of codes.
random_string

generate random strings
long_to_wide

Convert ICD data from long to wide format
re_just

Limit a regular expression to just what is given
rtf_strip

Strip RTF
icd9_fetch_ahrq_sas

get the SAS code from AHRQ
sas_extract_let_strings

Extract quoted or unquoted SAS string definitions
icd9_map_quan_elix

Quan adaptation of Elixhauser comorbidities
icd_parse_cc_hierarchy

Import CMS HCC Rules
icd9_get_chapters

get ICD-9 Chapters from vector of ICD-9 codes
sort_icd

Sort short-form ICD-9 codes
icd9_map_quan_deyo

Quan adaptation of Deyo/Charlson comorbidities
switch_ver_cmb

guess icd-9 or icd-10 or other type, and switch to call the given function
icd9_is_n

do ICD-9 codes belong to numeric, V or E sub-types?
strip

Strip character(s) from character vector
short_to_parts

Convert short format ICD codes to component parts
shortcode_icd9

set short_to_decimal attribute
is_valid

Check whether ICD-9 codes are syntactically valid
[[.comorbidity_map

Extract vector of codes from an ICD comorbidity map
icd9_parse_quan_deyo_sas

parse original SAS code defining Quan's update of Deyo comorbidities.
test_env

Set up a test environment which also has the internal functions
icd9_parse_cc

Generate ICD to HCC Crosswalks from CMS
uranium_pathology

United States Transuranium & Uranium Registries
van_walraven

Calculate van Walraven Elixhauser Score
is_valid.default

Test whether an ICD code is major
regexec32

regexec which accepts perl argument even in older R
icd9_sources

ICD-9 data sources
icd9_parse_leaf_desc_ver

Read the ICD-9-CM description data as provided by the Center for Medicaid Services (CMS).
rtf_fetch_year

Fetch RTF for a given year
unzip_to_data_raw

Unzip file to raw data directory
icd9cm_latest_edition

Latest ICD-9-CM edition
update_everything

generate all package data
rtf_parse_lines

parse lines of RTF
icd_classes_conflict

Check whether there are any ICD class conflicts
is_billable

Determine whether codes are billable leaf-nodes
parse_leaf_descriptions_all

Get billable codes from all available years
rtf_parse_year

parse RTF description of entire ICD-9-CM for a specific year
is.icd9

test ICD-related classes
parse_leaf_desc_icd9cm_v27

Parse billable codes for ICD-9-CM version 27
set_re_globals

Put ICD validation regexes in the icd::: namespace only
rtf_fix_quirks_2015

fix quirks for 2015 RTF parsing
lookupComorbidByChunkFor

core search for ICD code in a map
poa_choices

Present-on-admission flags
short_to_decimal

Convert ICD codes from short to decimal forms
str_pair_match

Match pairs of strings to get named vector
rtf_fix_unicode

Fix Unicode characters in RTF
lookupComorbid_alt_ByChunkForTaskloop

alternate comorbidity search
rtf_lookup_fourth

apply fourth digit qualifiers
str_match_all

return all matches for regular expression
trim

Trim leading and trailing white space
rtf_generate_fourth_lookup

generate look-up for four digit codes
print.comorbidity_map

Print a comorbidity map
rtf_main_filter

filter RTF for actual ICD-9 codes
strim

Trim leading and trailing white space from a single string
str_extract

stringr does this, but here we have a small amount of base R code
unzip_single

unzip a single file from URL
rtf_parse_fifth_digit_range

parse a row of RTF source data for ranges to apply fifth digit
wide_to_long

Convert ICD data from wide to long format
save_in_data_dir

Save given variable in package data directory
set_icd_class

Construct ICD-9 and ICD-10 data types
vec_to_env_true

create environment from vector
subset_icd

extract subset from ICD data
vermont_dx

Hospital discharge data from Vermont
swap_names_vals

swap names and values of a vector
charlson_from_comorbid

Calculate Charlson scores from pre-computed Charlson comorbidities
as.decimal_diag

Get or set whether ICD codes have have an attribute indicating 'short' or 'decimal' format
charlson

Calculate Charlson Comorbidity Index (Charlson Score)
attr_short_diag

Set short diagnosis flag in C++
children

Get children of ICD codes
chapter_to_desc_range

Parse a (sub)chapter text description with parenthesised range
attr_decimal_diag

Set ICD short-form diagnosis code attribute
apply_hier

Apply hierarchy and choose naming for each comorbidity map
all_identical

allow microbenchmark to compare multiple results
comorbid_hcc

Get Heirarchical Condition Codes (HCC)
comorbid_df_to_mat

convert comorbidity matrix to data frame
comorbid_common

Internal function to calculate co-morbidities.
comorbidMatMul

Comorbidity calculation as a matrix multiplication
classes_ordered

prefer an order of classes
comorbid_hcc_worker

apply HCC rules to either ICD-9 or ICD-10 codes
children_defined

defined children of ICD codes
as_char_no_warn

convert to character vector without warning
combine

combine ICD codes
comorbid_mat_to_df

convert comorbidity data frame from matrix