collapse provides an ensemble of functions to perform common data transformations efficiently and user friendly:
dapply
applies functions to rows or columns of matrices and data frames, preserving the data format.
BY
is an S3 generic for Split-Apply-Combine computing and can perform aggregation as well as grouped transformations (for aggregation please also see collap
and the Fast Statistical Functions).
A set of arithmetic operators facilitates row-wise %rr%
, %r+%
, %r-%
, %r*%
, %r/%
and
column-wise %cr%
, %c+%
, %c-%
, %c*%
, %c/%
replacing and sweeping operations involving a vector and a matrix or data frame / list.
TRA
is a more advanced S3 generic to efficiently perform (groupwise) replacing and sweeping out of statistics.
Supported operations are:
Integer-id | String-id | Description | ||
1 | "replace_fill" | replace and overwrite missing values | ||
2 | "replace" | replace but preserve missing values | ||
3 | "-" | subtract | ||
4 | "-+" | subtract group-statistics but add group-frequency weighted average of group statistics | ||
5 | "/" | divide | ||
6 | "%" | compute percentages | ||
7 | "+" | add | ||
8 | "*" | multiply | ||
9 | "%%" | modulus |
All of collapse's Fast Statistical Functions have a built-in TRA
argument for faster access (i.e. you can compute (groupwise) statistics and use them to transform your data with a single function call).
fscale/STD
is an S3 generic to perform (groupwise and / or weighted) scaling / standardizing of data and is orders of magnitude faster than scale
.
fwithin/W
is an S3 generic to efficiently perform (groupwise and / or weighted) within-transformations / demeaning / centering of data. Similarly fbetween/B
computes (groupwise and / or weighted) between-transformations / averages (also a lot faster than ave
).
fHDwithin/HDW
, shorthand for 'higher-dimensional within transform', is an S3 generic to efficiently center data on multiple groups and partial-out linear models (possibly involving many levels of fixed effects). In other words, fHDwithin/HDW
efficiently computes residuals from (potentially complex) linear models. Similarly fHDbetween/HDB
, shorthand for 'higher-dimensional between transformation', computes the corresponding means or fitted values.
flag/L/F
, fdiff/D/Dlog
and fgrowth/G
are S3 generics to compute sequences of lags / leads and suitably lagged and iterated (quasi-, log-) differences and growth rates on time series and panel data. More in Time Series and Panel Series.
STD, W, B, HDW, HDB, L, D, Dlog
and G
are parsimonious wrappers around the f-
functions above representing the corresponding transformation 'operators'. They have additional capabilities when applied to data-frames (i.e. variable selection, formula input, auto-renaming and id-variable preservation), and are easier to employ in regression formulas, but are otherwise identical in functionality.
Function / S3 Generic | Methods | Description | ||
dapply |
No methods, works with matrices and data frames | Apply functions to rows or columns | ||
BY |
default, matrix, data.frame, grouped_df |
Split-Apply-Combine computing | ||
%(r/c)(r/+/-/*//)% |
No methods, works with matrices and data frames / lists | Row- and column-arithmetic | ||
TRA |
default, matrix, data.frame, grouped_df |
Replace and sweep out statistics | ||
fscale/STD |
default, matrix, data.frame, pseries, pdata.frame, grouped_df |
Scale / standardize data | ||
fwithin/W |
default, matrix, data.frame, pseries, pdata.frame, grouped_df |
Demean / center data | ||
fbetween/B |
default, matrix, data.frame, pseries, pdata.frame, grouped_df |
Compute means / average data | ||
fHDwithin/HDW |
default, matrix, data.frame, pseries, pdata.frame |
High-dimensional centering and lm residuals | ||
fHDbetween/HDB |
default, matrix, data.frame, pseries, pdata.frame |
High-dimensional averages and lm fitted values | ||
flag/L/F |
default, matrix, data.frame, pseries, pdata.frame, grouped_df |
(Sequences of) lags / leads | ||
fdiff/D/Dlog |
default, matrix, data.frame, pseries, pdata.frame, grouped_df |
(Sequences of lagged/leaded and iterated quasi- log-) differences |
Collapse Overview, Fast Statistical Functions, Time Series and Panel Series