Learn R Programming

collapse (version 1.1.0)

A2-fast-grouping: Fast Grouping / collapse Grouping Objects

Description

GRP performs fast, ordered and unordered, groupings of vectors and data.frames (or lists of vectors) using data.table's fast grouping and ordering C routine (forder). The output is a list-like object of class 'GRP' which can be printed, plotted and used as an efficient input to all of collapse's fast functions, operators, as well as collap, BY and TRA.

Usage

GRP(X, ...)

# S3 method for default GRP(X, by = NULL, sort = TRUE, order = 1L, na.last = TRUE, return.groups = TRUE, return.order = FALSE, ...)

# S3 method for factor GRP(X, ...) # S3 method for qG GRP(X, ...) # S3 method for pseries GRP(X, effect = 1L, ...) # S3 method for pdata.frame GRP(X, effect = 1L, ...) # S3 method for grouped_df GRP(X, ...)

is.GRP(x) group_names.GRP(x, force.char = TRUE) as.factor.GRP(x)

# S3 method for GRP print(x, n = 6, ...)

# S3 method for GRP plot(x, breaks = "auto", type = "s", horizontal = FALSE, ...)

Arguments

X

a vector, list of columns or data.frame (default method), or a classed object (conversion/extractor methods).

x

a GRP object.

by

if X is a data.frame or list, by can indicate columns to use for the grouping (by default all columns are used). Columns must be passed using a vector of column names, indices, or using a one-sided formula i.e. ~ col1 + col2.

sort

logical. sort the groups (argument passed to data.table:::forderv, TRUE is like using keyby with data.table, vs. by).

order

integer. sort the groups in ascending (1L, default) or descending (-1L) order (argument passed to data.table:::forderv).

na.last

logical. if missing values are encountered in grouping vector/columns, assign them to the last group (argument passed to data.table:::forderv).

return.groups

logical. include the unique groups in the created 'GRP' object.

return.order

logical. include the output from data.table:::forderv in the created 'GRP' object.

force.char

logical. Always output group names as character vector, even if a single numeric vector was passed to GRP.default.

effect

plm methods: Select which panel identifier should be used as grouping variable. 1L means first variable in the plm::index, 2L the second etc.. More than one variable can be supplied.

n

integer. Number of groups to print out.

breaks

integer. Number of breaks in the histogram of group-sizes.

type

linetype for plot.

horizontal

logical. TRUE arranges plots next to each other, instead of above each other.

...

arguments to be passed to or from other methods.

Value

A list-like object of class `GRP' containing information about the number of groups, the observations (rows) belonging to each group, the size of each group, the unique group names / definitions, whether the groups are ordered or not and (optionally) the ordering vector used to perform the ordering. The object is structured as follows:

List-index Element-name Content type Content description

[[1]]

N.groups integer(1) Number of Groups

[[2]]

group.id integer(NROW(X)) An integer group-identifier

[[3]]

group.sizes integer(N.groups) Vector of group sizes

[[4]]

groups unique(X) or NULL Unique groups (same format as input, sorted if sort = TRUE), or NULL if return.groups = FALSE

[[5]]

group.vars character The names of the grouping variables
[[6]] ordered logical(2) [1]- TRUE if sort = TRUE, [2]- TRUE if X already sorted

[[7]]

order integer(NROW(X)) or NULL Ordering vector from data.table:::forderv or NULL if return.order = FALSE (the default)

Details

GRP is a central function in the collapse package because it provides the key inputs to facilitate easy and efficient groupwise-programming at the C/C++ level: Information about (1) the number of groups (2) an integer group-id indicating which values / rows belong to which group and (3) information about the size of each group. Provided with these informations, collapse's Fast Statistical Functions pre-allocate intermediate and result vectors of the right sizes and (in most cases) perform grouped statistical computations in a single pass through the data.

The sorting and ordering functionality for GRP only affects (2), that is groups receive different integer-id's depending on whether the groups are sorted sort = TRUE, and in which order (order = 1 ascending or order = -1 descending). This in-turn changes the order of values/rows in the output of collapse functions (the row/value corresponding to group 1 always comes out on top). The default setting with sort = TRUE and order = 1 results in groups being sorted in ascending order. This is equivalent to performing grouped operations in data.table using keyby, whereas sort = FALSE is equivalent to data.table grouping with by.

Evidently GRP is an S3 generic function with one default method supporting vector and list input and several conversion methods. The most important of these is the conversion of factors to 'GRP' objects and vice-versa. To obtain a 'GRP' object from a factor, one simply gets the number of groups calling ng <- length(levels(f)) (1) and then computes the count of each level (3) using tabulate(f, ng). The integer group-id (2) is already given by the factor itself after removing the levels and class attributes. The levels are put in a list and moved to position (4) in the 'GRP' object, which is reserved for the unique groups. Going from factor to 'GRP' object thus only requires a tabulation of the levels, whereas creating a factor from a 'GRP' object using as.factor.GRP does not involve any computations, but may involve interactions if multiple grouping columns were used (which are then interacted to produce unique factor levels) or as.character conversions if the grouping column(s) were numeric (which are potentially expensive).

Note: For faster factor generation and a factor-light class 'qG' which avoids the coercion of factor levels to character also see qF and qG.

See Also

qF, qG, Collapse Overview

Examples

Run this code
# NOT RUN {
## default method
GRP(mtcars$cyl)
GRP(mtcars, ~ cyl + vs + am)      # or GRP(mtcars, c("cyl","vs","am")) or GRP(mtcars, c(2,8:9))
g <- GRP(mtcars, ~ cyl + vs + am) # saving the object
plot(g)                           # plotting it
group_names.GRP(g)                # retain group names
fsum(mtcars, g)                   # compute the sum of mtcars, grouped by variables cyl, vs and am.

## convert factor to GRP object
GRP(iris$Species)

## get GRP object from a dplyr grouped tibble
library(dplyr)
mtcars %>% group_by(cyl,vs,am) %>% GRP

# }

Run the code above in your browser using DataLab