Learn R Programming

collapse (version 1.7.6)

fast-grouping-ordering: Fast Grouping and Ordering

Description

collapse provides the following functions to efficiently group and order data:

  • radixorder, provides fast radix-ordering through direct access to the method order(..., method = "radix"), as well as the possibility to return some attributes very useful for grouping data and finding unique elements. radixorderv exists as a programmers alternative. The function roworder(v) efficiently reorders a data frame based on an ordering computed by radixorderv.

  • group provides fast grouping in first-appearance order of rows, based on a hashing algorithm in C. Objects have class 'qG', see below.

  • GRP creates collapse grouping objects of class 'GRP' based on radixorderv or group. 'GRP' objects form the central building block for grouped operations and programming in collapse and are very efficient inputs to all collapse functions supporting grouped operations. A 'GRP' object provides information about (1) the number of groups, (2) which rows belong to which group, (3) the group sizes, (4) the unique groups, (5) the variables used for grouping, (6) whether the grouping and initial inputs were ordered and (7) (optionally) the output from radixorder containing the ordering vector with group starts and maximum group size attributes.

  • fgroup_by provides a fast replacement for dplyr::group_by, creating a grouped data frame (or data.table / tibble etc.) with a 'GRP' object attached. This grouped frame can be used for grouped operations using collapse's fast functions.

  • funique is a faster version of unique. The data frame method also allows selecting unique rows according to a subset of the columns.

  • qF, shorthand for 'quick-factor' implements very fast factor generation from atomic vectors using either radix ordering method = "radix" or hashing method = "hash". Factors can also be used for efficient grouped programming with collapse functions, especially if they are generated using qF(x, na.exclude = FALSE) which assigns a level to missing values and attaches a class 'na.included' ensuring that no additional missing value checks are executed by collapse functions.

  • qG, shorthand for 'quick-group', generates a kind of factor-light without the levels attribute but instead an attribute providing the number of levels. Optionally the levels / groups can be attached, but without converting them to character. Objects have a class 'qG', which is also recognized in the collapse ecosystem.

  • fdroplevels is a substantially faster replacement for droplevels.

  • finteraction is a fast alternative to interaction implemented as a wrapper around as_factor_GRP(GRP(…)). It can be used to generate a factor from multiple vectors, factors or a list of vectors / factors. Unused factor levels are always dropped.

  • groupid is a generalization of data.table::rleid providing a run-length type group-id from atomic vectors. It is generalization as it also supports passing an ordering vector and skipping missing values. For example qF and qG with method = "radix" are essentially implemented using groupid(x, radixorder(x)).

  • seqid is a specialized function which creates a group-id from sequences of integer values. For any regular panel dataset groupid(id, order(id, time)) and seqid(time, order(id, time)) provide the same id variable. seqid is especially useful for identifying discontinuities in time-sequences.

Arguments

Table of Functions

Function / S3 Generic Methods Description
radixorder(v) No methods, for data frames and vectors Radix-based ordering + grouping information
roworder(v) No methods, for data frames Row sorting/reordering
group No methods, for data frames and vectors Hash-based grouping + grouping information
GRP default, GRP, factor, qG, grouped_df, pseries, pdata.frame Fast grouping and a flexible grouping object
fgroup_by No methods, for data frames Fast grouped data frame
funique default, data.frame Fast unique values/rows
qF No methods, for vectors Quick factor generation
qG No methods, for vectors Quick grouping of vectors and a 'factor-light' class
fdroplevels factor, data.frame, list Fast removal of unused factor levels
finteraction No methods, for data frames and vectors Fast interactions
groupid No methods, for vectors Run-length type group-id
seqid No methods, for vectors Run-length type integer sequence-id

See Also

Collapse Overview, Data Frame Manipulation, Fast Statistical Functions