These functions are quite important. Whenever a vector is passed to a collapse function such as fmean(mtcars, mtcars$cyl)
, is is grouped using qF
or qG
.
qF
is a combination of as.factor
and factor
. Applying it to a vector i.e. qF(x)
gives the same result as as.factor(x)
. qF(x, ordered = TRUE)
generates and ordered factor (same as factor(x, ordered = TRUE)
), and qF(x, na.exclude = FALSE)
generates a level for missing values (same as factor(x, exclude = NULL)
). An important addition is that qF(x, na.exclude = FALSE)
also adds a class 'na.included'. This prevents collapse functions from checking missing values in the factor, and is thus computationally more efficient. Therefore factors used in grouped operations should preferably be generated using qF(x, na.exclude = FALSE)
. Setting sort = FALSE
gathers the levels in a random order (unless method = "radix"
and x
is numeric, in which case the levels are always sorted). This can provide a speed improvement, particularly for character data.
There are two methods of computation: radix ordering and index hashing. Radix ordering is done through combining the functions radixorder
and groupid
. It is generally faster than index hashing for large numeric data (although there are exceptions). Index hashing is done using Rcpp::sugar::sort_unique
and Rcpp::sugar::match
. It is generally faster for character data. For logical data, a super fast one-pass method was written which is subsumed in the hash method. Regarding speed: In general qF
is around 5x faster than as.factor
on character data and about 30x faster on numeric data. Automatic method dispatch typically does a good job delivering optimal performance.
qG
is in the first place a programmers function. It generates a factor-'light' consisting of only an integer grouping vector and an attribute providing the number of groups. It is faster and more memory efficient than GRP
for grouping atomic vectors, which is the main reason it exists. The fact that it (optionally) returns the unique groups / levels without converting them to character is an added bonus (this also provides a small performance gain compared to qF
).
finteraction
is simply a wrapper around as.factor_GRP(GRP.default(X, sort = TRUE))
, where X is replaced by the arguments in '…' combined in a list. See GRP
for computational details. In general: All vectors, factors, or lists of vectors / factors passed can be interacted. Interactions always create a level for missing values and always drop any unused levels.