This is the function which prepares the input data for the categorization, and forms the core of the package, along with the C++ matrix code. This is pure data manipulation and generalizable beyond medical data.
categorize_simple(x, map, id_name, code_name, return_df = FALSE,
return_binary = FALSE, restore_id_order = TRUE, unique_ids = FALSE,
preserve_id_type = FALSE, comorbid_fun = comorbidMatMulSimple)
Data frame containing a column for an 'id' and a column for a code, e.g., an ICD-10 code.
named list containing vectors of ICD-9 codes. E.g. the AHRQ ICD-9
comorbidities, contains list(OBESE = c("2780", "27800", "27801",
"27803", "V8554", "79391", "64910", "64911", "64912", "64913", "64914",
"V8530", "V8531", "V8532", "V8533", "V8534", "V8535", "V8536", "V8537",
"V8538", "V8539", "V8541", "V8542", "V8543", "V8544", "V8545" ), DEPRESS =
c("3004", "30112", "3090", "3091", "311"))
amongst other longer groups.
The name of the data.frame
field which is the unique
identifier.
String with name of column containing the codes.
single logical value, if TRUE
, return the result as a data frame with the first column being
the visit_id
, and the second being the count. If visit_id
was a factor or named differently in the
input, this is preserved.
Logical value, if TRUE
, the output will be in 0s
and 1s instead of TRUE and FALSE.
Logical value, if TRUE
, the default, the order
of the visit IDs will match the order of visit IDs first encountered in the
input data. This takes a third of the time in calculations on data with
tens of millions of rows, so, if the visit IDs will be discarded when
summarizing data, this can be set to FALSE
for a big speed-up.
Single logical value, if TRUE
then the visit IDs in
column given by id_name
are assumed to be unique. Otherwise, the
default action is to ensure they are unique.
Single logical value, if TRUE
, the visit ID
column will be converted back to its original type. The default of
FALSE
means only factors
and character
types are
restored in the returned data frame. For matrices, the row names are
necessarily stored as character vectors.
function i.e. the function symbol (not character string) to be called to do the comorbidity calculation
# NOT RUN {
u <- uranium_pathology
m <- icd10_map_ahrq
u$icd10 <- decimal_to_short(u$icd10)
j <- categorize_simple(u, m, id_name = "case", code_name = "icd10")
# }
Run the code above in your browser using DataLab