Topics and Functions
Topic |
|
Main Features / Keywords |
|
Functions |
|
Fast Statistical Functions
|
Fast (grouped and weighted) statistical functions for vector, matrix, data frame and grouped data frames (class 'grouped_df', dplyr compatible). |
|
fsum , fprod , fmean , fmedian , fmode , fvar , fsd , fmin , fmax , fnth , ffirst , flast , fnobs , fndistinct |
|
|
|
Fast Grouping and Ordering
|
Fast (ordered) groupings from vectors, data frames, lists. 'GRP' objects are extremely efficient inputs for programming with collapse's fast functions. fgroup_by can attach them to a data frame, for fast dplyr-style grouped computations. Fast splitting of vectors based on 'GRP' objects, fast radix-sort based ordering and hash-based grouping (the workhorses behind GRP ), fast unique values/rows, factor generation, vector grouping, interactions, generalized run-length type grouping and grouping of time-sequences.
|
|
GRP , as_factor_GRP , GRPnames , is_GRP , gsplit , fgroup_by , fgroup_vars , fungroup , radixorder(v) , group , funique , qF , qG , is_qG , fdroplevels , finteraction , groupid , seqid |
|
|
|
Fast Data Manipulation
|
Fast and flexible select, subset, summarise, mutate/transform, sort/reorder, rename and relabel data. In addition a set of (standard evaluation) functions for fast selecting, replacing or adding data frame columns, including shortcuts to select and replace variables by data type.
|
|
fselect(<-) , fsubset/ss , fsummarise , fmutate , across , (f/set)transform(v)(<-) , fcompute(v) , roworder(v) , colorder(v) , (f/set)rename , (set)relabel , get_vars(<-) , add_vars(<-) , num_vars(<-) , cat_vars(<-) , char_vars(<-) , fact_vars(<-) , logi_vars(<-) , date_vars(<-) |
|
|
|
Quick Data Conversion
|
Quick conversions: data.frame <> data.table <> tibble | matrix <> list, data.frame, data.table (row- or column- wise), tibble | array > matrix, data.frame, data.table, tibble | list > data.frame, data.table, tibble | vector > factor, matrix, data.frame, data.table, tibble; and converting factors / all factor columns. |
|
qDF , qDT , qTBL , qM , qF , mrtl , mctl , as_numeric_factor , as_character_factor |
|
|
|
Advanced Data Aggregation
|
Fast and easy (weighted and parallelized) aggregation of multi-type data, with (multiple) functions applied to numeric and categorical columns. Also supports fully customized aggregation tasks mapping functions to columns + renaming. |
|
collap(v/g) |
|
|
|
Data Transformations
|
Fast row- and column- arithmetic and (object preserving) apply functionality for vectors, matrices and data frames. Fast (grouped) replacing and sweeping of statistics and (grouped and weighted) scaling / standardizing, (higher-dimensional) within- and between-transformations (i.e. centering and averaging), linear prediction and partialling out. Additional methods for grouped_df (dplyr) and pseries, pdata.frame (plm). |
|
%(r/c)r% , %(r/c)(+/-/*//)% , dapply , BY , TRA , fscale/STD , fbetween/B , fwithin/W , fhdbetween/HDB , fhdwithin/HDW |
|
|
|
Linear Models
|
Fast (weighted) linear model fitting with 6 different solvers and a fast F-test to test exclusion restrictions on linear models with (large) factors. |
|
flm , fFtest |
|
|
|
Time Series and Panel Series
|
Fast (sequences of) lags / leads and (lagged / leaded and iterated) differences, quasi-differences, (quasi-) log-differences and (compounded) growth rates on (unordered, irregular) time series and panel data. Flexible cumulative summations. Panel data to (ts-)array conversions. Multivariate panel- auto-, partial- and cross-correlation functions. Additional methods for grouped_df (dplyr) and pseries, pdata.frame (plm). |
|
flag/L/F , fdiff/D/Dlog , fgrowth/G , fcumsum , psmat , psacf , pspacf , psccf |
|
List Processing
|
(Recursive) list search and identification, search and extract list-elements / list-subsetting, splitting, list-transpose, apply functions to lists of data frames / data objects, and (fast) generalized recursive row-binding / unlisting in 2-dimensions / to data frame. |
|
is_unlistable , ldepth , has_elem , get_elem , atomic_elem(<-) , list_elem(<-) , reg_elem , irreg_elem , rsplit , t_list , rapply2d , unlist2d |
|
Summary Statistics
|
Fast (grouped and weighted), summary statistics for cross-sectional and complex multilevel / panel data. Efficient detailed description of data frame. Fast check of variation in data (within groups / dimensions). (Weighted) pairwise correlations and covariances (with observation count, p-value and pretty printing), pairwise observation count. Some additional methods for grouped_df (dplyr) pseries and pdata.frame (plm). |
|
qsu , descr , varying , pwcor , pwcov , pwnobs |
|
|
|
|
Recode and Replace Values
|
Recode multiple values (exact or regex matching) and replace NaN/Inf/-Inf and outliers (according to 1- or 2-sided threshold or standard-deviations) in vectors, matrices or data frames. Insert a value at arbitrary positions into vectors, matrices or data frames. |
|
recode_num , recode_char , replace_NA , replace_Inf , replace_outliers , pad |
|
|
|
(Memory) Efficient Programming
|
Efficient comparisons of a vector/matrix with a value, and replacing values/rows in vector/matrix/DF (all avoiding the generation of logical vectors or subsets), faster generation of initialized vectors, and fast mathematical operations on vectors/matrices/DF's with no copies at all.
Fast missing value detection, (random) insertion and removal, fast data lengths and C storage types, faster nlevels for factors, fast nrow , ncol , dim (for data frames) and seq_along rows or columns. Choleski (fast) inverse of symmetric PD matrix. |
|
anyv , allv , allNA , whichv , whichNA , %==% ,
%!=% , copyv , setv , alloc , setop , %+=% , %-=% , %*=% , %/=% , missing_cases , na_insert , na_rm , na_omit , vlengths , vtypes , fnlevels , fnrow , fncol , fdim , seq_row , seq_col , cinv |
|
|
|
Small (Helper) Functions
|
Multiple-assignment, non-standard concatenation, set and extract variable labels, extract variable classes, display variable names and labels together, add / remove prefix or postfix to / from column names, not-in operator, matching with error message for non-matched, check exact or near / numeric equality of multiple objects or of all elements in a list, return object with dimnames, row- or colnames efficiently set, or with all attributes removed, C-level functions to set and duplicate / copy attributes, identify categorical and date(-time) objects. |
|
massign , %=% , .c , vlabels(<-) , setLabels , vclasses , namlab , add_stub , rm_stub , %!in% , ckmatch , all_identical , all_obj_equal , setDimnames , setRownames , setColnames , unattrib , setAttrib , copyAttrib , copyMostAttrib , is_categorical , is_date |
|
|
|
Data and Global Macros
|
Groningen Growth and Development Centre 10-Sector Database, World Bank World Development dataset, and some global macros containing links to the topical documentation pages (including this page), all exported objects (excluding exported S3 methods), all generic functions, the 2 datasets, all fast functions, all fast statistical (scalar-valued) functions, and all transformation operators (these are not infix functions but function shortcuts resembling operators in a statistical sense, such as the lag/lead operators L /F , both wrapping flag , see .OPERATOR_FUN ). |
|
GGDC10S, wlddev, .COLLAPSE_TOPICS, .COLLAPSE_ALL, .COLLAPSE_GENERIC, .COLLAPSE_DATA, .FAST_FUN, .FAST_STAT_FUN, .OPERATOR_FUN |
|
|