The problem is that the matrices could be huge: the patient-icd matrix would be millions of patient rows, and ~15000 columns for all AHRQ comorbidities.
comorbidMatMulSimple(icd9df, icd9Mapping, visitId, icd9Field)
Using sparse matrices is another solution. Building the initial matrix may become a significant part of the calculation, but once done, the solution could be a simple matrix multiplication, which is potentially highly optimized (Eigen, BLAS, GPU, etc.)
Eigen has parallel (non-GPU) optimized sparse row-major * dense matrix. Patients-ICD matrix must be the row-major sparse one, so the dense matrix is then the comorbidity map https://eigen.tuxfamily.org/dox/TopicMultiThreading.html
Several ways of reducing the problem: firstly, as with existing code, we can drop any ICD codes from the map which are not in the patient data. With many patients, this will be less effective as the long tail becomes apparent. However, with the (small) Vermont data, we see ~15,000 codes being reduced to 339.