This is the function which prepares the input data for the categorization, and forms the core of the package, along with the C++ matrix code. This is pure data manipulation and generalizable beyond medical data.
categorize_simple(
x,
map,
id_name,
code_name,
return_df = FALSE,
return_binary = FALSE,
restore_id_order = TRUE,
preserve_id_type = FALSE,
comorbid_fun = comorbid_mat_mul_wide,
...
)
Data frame containing a column for an 'id' and a column for a code, e.g., an ICD-10 code.
named list containing vectors of ICD-9 codes. E.g. the AHRQ ICD-9
comorbidities, contains list(OBESE = c("2780", "27800", "27801",
"27803", "V8554", "79391", "64910", "64911", "64912", "64913", "64914",
"V8530", "V8531", "V8532", "V8533", "V8534", "V8535", "V8536", "V8537",
"V8538", "V8539", "V8541", "V8542", "V8543", "V8544", "V8545" ), DEPRESS =
c("3004", "30112", "3090", "3091", "311"))
amongst other longer groups.
The name of the data.frame
field which is the unique
identifier.
String with name(s) of column(s) containing the codes.
single logical value, if TRUE
, return 'tidy' data,
i.e., the result is a data frame with the first column being the
visit_id
, and the second being the count. If visit_id
was a
factor or named differently in the input, this is preserved.
Logical value, if TRUE
, the output will be in 0s
and 1s instead of TRUE
and FALSE
.
Logical value, if TRUE
, the default, the order
of the visit IDs will match the order of visit IDs first encountered in the
input data. This takes a third of the time in calculations on data with
tens of millions of rows, so, if the visit IDs will be discarded when
summarizing data, this can be set to FALSE
for a big speed-up.
Single logical value, if TRUE
, the visit ID
column will be converted back to its original type. The default of
FALSE
means only factors
and character
types are
restored in the returned data frame. For matrices, the row names are
necessarily stored as character vectors.
function i.e. the function symbol (not character string) to be called to do the comorbidity calculation
arguments passed on to other functions
The roadmap for icd includes the optimized categorization component being packaged independently, and the comorbidity package taking on the front-end for doing ICD-code-based comorbidities. This is in discussion.
# NOT RUN {
u <- uranium_pathology
m <- icd10_map_ahrq
u$icd10 <- decimal_to_short(u$icd10)
j <- icd:::categorize_simple(u, m, id_name = "case", code_name = "icd10")
# }
Run the code above in your browser using DataLab