collapse provides an ensemble of functions to perform common data transformations efficiently and user friendly:
dapply
applies functions to rows or columns of matrices and data frames, preserving the data format.
BY
is an S3 generic for Split-Apply-Combine computing and can perform aggregation as well as grouped transformations (for aggregation please also see collap
and the Fast Statistical Functions).
A set of arithmetic operators facilitates row-wise %rr%
, %r+%
, %r-%
, %r*%
, %r/%
and
column-wise %cr%
, %c+%
, %c-%
, %c*%
, %c/%
replacing and sweeping operations involving a vector and a matrix or data frame / list. Since v1.7, the operators %+=%
, %-=%
, %*=%
and %/=%
do column- and element- wise math by reference, and the function setop
can also perform sweeping out rows by reference.
TRA
is a more advanced S3 generic to efficiently perform (groupwise) replacing and sweeping out of statistics.
Supported operations are:
Integer-id | String-id | Description | ||
1 | "replace_fill" | replace and overwrite missing values | ||
2 | "replace" | replace but preserve missing values | ||
3 | "-" | subtract | ||
4 | "-+" | subtract group-statistics but add group-frequency weighted average of group statistics | ||
5 | "/" | divide | ||
6 | "%" | compute percentages | ||
7 | "+" | add | ||
8 | "*" | multiply | ||
9 | "%%" | modulus |
All of collapse's Fast Statistical Functions have a built-in TRA
argument for faster access (i.e. you can compute (groupwise) statistics and use them to transform your data with a single function call).
fscale/STD
is an S3 generic to perform (groupwise and / or weighted) scaling / standardizing of data and is orders of magnitude faster than scale
.
fwithin/W
is an S3 generic to efficiently perform (groupwise and / or weighted) within-transformations / demeaning / centering of data. Similarly fbetween/B
computes (groupwise and / or weighted) between-transformations / averages (also a lot faster than ave
).
fhdwithin/HDW
, shorthand for 'higher-dimensional within transform', is an S3 generic to efficiently center data on multiple groups and partial-out linear models (possibly involving many levels of fixed effects). In other words, fhdwithin/HDW
efficiently computes residuals from (potentially complex) linear models. Similarly fhdbetween/HDB
, shorthand for 'higher-dimensional between transformation', computes the corresponding means or fitted values.
flag/L/F
, fdiff/D/Dlog
and fgrowth/G
are S3 generics to compute sequences of lags / leads and suitably lagged and iterated (quasi-, log-) differences and growth rates on time series and panel data. fcumsum
flexibly computes cumulative sums. More in Time Series and Panel Series.
STD, W, B, HDW, HDB, L, D, Dlog
and G
are parsimonious wrappers around the f-
functions above representing the corresponding transformation 'operators'. They have additional capabilities when applied to data-frames (i.e. variable selection, formula input, auto-renaming and id-variable preservation), and are easier to employ in regression formulas, but are otherwise identical in functionality.
Function / S3 Generic | Methods | Description | ||
dapply |
No methods, works with matrices and data frames | Apply functions to rows or columns | ||
BY |
default, matrix, data.frame, grouped_df |
Split-Apply-Combine computing | ||
%(r/c)(r/+/-/*//)% |
No methods, works with matrices and data frames / lists | Row- and column-arithmetic | ||
TRA |
default, matrix, data.frame, grouped_df |
Replace and sweep out statistics | ||
fscale/STD |
default, matrix, data.frame, pseries, pdata.frame, grouped_df |
Scale / standardize data | ||
fwithin/W |
default, matrix, data.frame, pseries, pdata.frame, grouped_df |
Demean / center data | ||
fbetween/B |
default, matrix, data.frame, pseries, pdata.frame, grouped_df |
Compute means / average data | ||
fhdwithin/HDW |
default, matrix, data.frame, pseries, pdata.frame |
High-dimensional centering and lm residuals | ||
fhdbetween/HDB |
default, matrix, data.frame, pseries, pdata.frame |
High-dimensional averages and lm fitted values |
Collapse Overview, Fast Statistical Functions, Time Series and Panel Series