collapse provides the following functions to efficiently group (and order) data:
radixorder
, provides fast radix-ordering (+ grouping information) through direct access to the method base::order(..., method = "radix")
. The source code for both radixorder
and base::order(..., method = "radix")
, comes from data.table:::forder
. radixorder
was modified to optionally return either a vector of group starts, a vector of group sizes, or both as an attribute, and also an attribute providing the size of the largest group and a logical statement on whether the input was already ordered. The function radixorderv
exists as a programmers alternative.
GRP
creates collapse grouping objects of class 'GRP' based on radixorderv
. 'GRP' objects form the central building block for grouped operations and programming in collapse and are very efficient inputs to all collapse functions supporting grouped operations. A 'GRP' object provides information about (1) the number of groups, (2) which rows belong to which group, (3) the group sizes, (4) the unique groups, (5) the variables used for grouping, (6) whether the grouping and initial inputs were ordered and (7) (optionally) the output from radixorder
containing the ordering vector with group starts and maximum group size attributes.
fgroup_by
provides a fast replacement for dplyr::group_by
, creating a grouped tibble with a 'GRP' object attached. This grouped tibble can however only be used for grouped operations using collapse fast functions. dplyr
functions will treat this tibble like an ordinary (non-grouped) one.
qF
, shorthand for 'quick-factor' implements very fast (ordered) factor generation from atomic vectors using either radix ordering method = "radix"
or index hashing method = "hash"
. Factors can also be used for efficient grouped programming with collapse functions, especially if they are generated using qF(x, na.exclude = FALSE)
which assigns a level to missing values and attaches a class 'na.included' ensuring that no additional missing value checks are executed by collapse functions.
qG
, shorthand for 'quick-group', generates a kind of factor-light without the levels attribute but instead an attribute providing the number of levels. Optionally the levels / groups can be attached, but without converting them to character. Objects have a class 'qG', which is also recognized in the collapse ecosystem.
finteraction
is a fast alternative to base::interaction
implemented as a wrapper around as.factor.GRP(GRP(...))
. It can ge used to generate a factor from multiple vectors, factors or a list of vectors / factors. Unused factor levels are always dropped.
groupid
is a generalization of data.table::rleid
providing a run-length type group-id from atomic vectors. It is generalization as it also supports passing an ordering vector and skipping missing values. For example qF
and qG
with method = "radix"
are essentially implemented using groupid(x, radixorder(x))
.
seqid
is a specialized function which creates a group-id from sequences of integer values. For any ordinary panel-dataset groupid(id, order(id, time))
and seqid(time, order(id, time))
provide the same id variable. seqid
is especially useful for identifying discontinuities in time-sequences and helps to perform operations such as lags or differences on irregularly spaced time-series and panels.
Function / S3 Generic | Methods | Description | ||
radixorder , radixorderv |
No methods, for data.frame's and vectors | radix based ordering + grouping information | ||
GRP |
default, factor, qG, grouped_df, pseries, pdata.frame |
fast (ordered) grouping | ||
fgroup_by |
No methods, for data.frame's | fast grouped tibbles | ||
qF |
No methods, for vectors | quick factor generation | ||
qG |
No methods, for vectors | quick grouping | ||
finteraction |
No methods, for data.frame's and vectors | faster interactions | ||
groupid |
No methods, for vectors | run-length type group-id | ||
seqid |
No methods, for vectors | run-length type integer sequence-id |