collapse
Grouping ObjectsGRP
performs fast, ordered and unordered, groupings of vectors and data.frames (or lists of vectors) using data.table
's fast grouping and ordering C
routine (forder
). The output is a list-like object of class 'GRP' which can be printed, plotted and used as an efficient input to all of collapse
's fast functions, operators, as well as collap
, BY
and TRA
.
GRP(X, ...)# S3 method for default
GRP(X, by = NULL, sort = TRUE, order = 1L, na.last = TRUE,
return.groups = TRUE, return.order = FALSE, ...)
# S3 method for factor
GRP(X, ...)
# S3 method for qG
GRP(X, ...)
# S3 method for pseries
GRP(X, effect = 1L, ...)
# S3 method for pdata.frame
GRP(X, effect = 1L, ...)
# S3 method for grouped_df
GRP(X, ...)
is.GRP(x)
group_names.GRP(x, force.char = TRUE)
as.factor.GRP(x)
# S3 method for GRP
print(x, n = 6, ...)
# S3 method for GRP
plot(x, breaks = "auto", type = "s", horizontal = FALSE, ...)
a vector, list of columns or data.frame (default method), or a classed object (conversion/extractor methods).
a GRP object.
if X
is a data.frame or list, by
can indicate columns to use for the grouping (by default all columns are used). Columns must be passed using a vector of column names, indices, or using a one-sided formula i.e. ~ col1 + col2
.
logical. sort the groups (argument passed to data.table:::forderv
, TRUE
is like using keyby
with data.table
, vs. by
).
integer. sort the groups in ascending (1L, default) or descending (-1L) order (argument passed to data.table:::forderv
).
logical. if missing values are encountered in grouping vector/columns, assign them to the last group (argument passed to data.table:::forderv
).
logical. include the unique groups in the created 'GRP' object.
logical. include the output from data.table:::forderv
in the created 'GRP' object.
logical. Always output group names as character vector, even if a single numeric vector was passed to GRP.default
.
plm
methods: Select which panel identifier should be used as grouping variable. 1L means first variable in the plm::index
, 2L the second etc.. More than one variable can be supplied.
integer. Number of groups to print out.
integer. Number of breaks in the histogram of group-sizes.
linetype for plot.
logical. TRUE
arranges plots next to each other, instead of above each other.
arguments to be passed to or from other methods.
A list-like object of class `GRP' containing information about the number of groups, the observations (rows) belonging to each group, the size of each group, the unique group names / definitions, whether the groups are ordered or not and (optionally) the ordering vector used to perform the ordering. The object is structured as follows:
List-index | Element-name | Content type | Content description | |||
[[1]] |
N.groups | integer(1) |
Number of Groups | |||
[[2]] |
group.id | integer(NROW(X)) |
An integer group-identifier | |||
[[3]] |
group.sizes | integer(N.groups) |
Vector of group sizes | |||
[[4]] |
groups | unique(X) or NULL |
Unique groups (same format as input, sorted if sort = TRUE ), or NULL if return.groups = FALSE |
|||
[[5]] |
group.vars | character |
The names of the grouping variables | |||
[[6]] | ordered | logical(2) |
[1]- TRUE if sort = TRUE , [2]- TRUE if X already sorted |
|||
[[7]] |
order | integer(NROW(X)) or NULL |
Ordering vector from data.table:::forderv or NULL if return.order = FALSE (the default) |
GRP
is a central function in the collapse
package because it provides the key inputs to facilitate easy and efficient groupwise-programming at the C/C++
level: Information about (1) the number of groups (2) an integer group-id indicating which values / rows belong to which group and (3) information about the size of each group. Provided with these informations, collapse
's Fast Statistical Functions pre-allocate intermediate and result vectors of the right sizes and (in most cases) perform grouped statistical computations in a single pass through the data.
The sorting and ordering functionality for GRP
only affects (2), that is groups receive different integer-id's depending on whether the groups are sorted sort = TRUE
, and in which order (order = 1
ascending or order = -1
descending). This in-turn changes the order of values/rows in the output of collapse
functions (the row/value corresponding to group 1 always comes out on top). The default setting with sort = TRUE
and order = 1
results in groups being sorted in ascending order. This is equivalent to performing grouped operations in data.table
using keyby
, whereas sort = FALSE
is equivalent to data.table
grouping with by
.
Evidently GRP
is an S3 generic function with one default method supporting vector and list input and several conversion methods. The most important of these is the conversion of factors to 'GRP' objects and vice-versa. To obtain a 'GRP' object from a factor, one simply gets the number of groups calling ng <- length(levels(f))
(1) and then computes the count of each level (3) using tabulate(f, ng)
. The integer group-id (2) is already given by the factor itself after removing the levels and class attributes. The levels are put in a list and moved to position (4) in the 'GRP' object, which is reserved for the unique groups. Going from factor to 'GRP' object thus only requires a tabulation of the levels, whereas creating a factor from a 'GRP' object using as.factor.GRP
does not involve any computations, but may involve interactions if multiple grouping columns were used (which are then interacted to produce unique factor levels) or as.character
conversions if the grouping column(s) were numeric (which are potentially expensive).
Note: For faster factor generation and a factor-light class 'qG' which avoids the coercion of factor levels to character also see qF
and qG
.
# NOT RUN {
## default method
GRP(mtcars$cyl)
GRP(mtcars, ~ cyl + vs + am) # or GRP(mtcars, c("cyl","vs","am")) or GRP(mtcars, c(2,8:9))
g <- GRP(mtcars, ~ cyl + vs + am) # saving the object
plot(g) # plotting it
group_names.GRP(g) # retain group names
fsum(mtcars, g) # compute the sum of mtcars, grouped by variables cyl, vs and am.
## convert factor to GRP object
GRP(iris$Species)
## get GRP object from a dplyr grouped tibble
library(dplyr)
mtcars %>% group_by(cyl,vs,am) %>% GRP
# }
Run the code above in your browser using DataLab