Learn R Programming

icd (version 3.3)

factor_nosort_rcpp_worker: Fast Factor Generation

Description

This function generates factors more quickly, without leveraging fastmatch. The speed increase with fastmatch for ICD-9 codes was about 33 using Rcpp, and a hashed matching algorithm.

Usage

factor_nosort_rcpp_worker(x, levels, na_rm)

factor_nosort(x, levels)

factor_nosort_rcpp(x, levels, na.rm = FALSE)

Arguments

x

An object of atomic type integer, numeric, character or logical.

levels

An optional character vector of levels. Is coerced to the same type as x. By default, we compute the levels as sort(unique.default(x)).

na.rm

Logical, if TRUE, simple drop all NA values, i.e., values with no corresponding level.

labels

A set of labels used to rename the levels, if desired.

Functions

  • factor_nosort_rcpp_worker: Rcpp implementation, requiring character vector inputs only, no argument checking.

  • factor_nosort_rcpp: R wrapper to the Rcpp function. Will re-factor a factor with new levels without converting to string vector.

Details

NaNs are converted to NA when used on numeric values. Extracted from https://github.com/kevinushey/Kmisc.git

These feature from base R are missing: exclude = NA, ordered = is.ordered(x), nmax = NA

I don't think there is any requirement for factor levels to be sorted in advance, especially not for ICD-9 codes where a simple alphanumeric sorting will likely be completely wrong.

Examples

Run this code
# NOT RUN {
x <- c("z", "a", "123")
icd:::factor_nosort(x)
# should return a factor without modification
x <- as.factor(x)
identical(icd:::factor_nosort(x), x)
# unless the levels change:
icd:::factor_nosort(x, levels = c("a", "z"))

# existing factor levels aren't re-ordered without also moving elements
f <- factor(c("a", "b", "b", "c"))
g <- icd:::factor_nosort(f, levels = c("a", "c", "b"))
stopifnot(g[4] == "c")
# }

Run the code above in your browser using DataLab