A1-fast-statistical-functions: Fast (Grouped, Weighted) Statistical Functions for Matrix-Like Objects

Description

With fsum, fprod, fmean, fmedian, fmode, fvar, fsd, fmin, fmax, ffirst, flast, fNobs and fNdistinct, collapse presents a coherent set of extremely fast and flexible statistical functions (S3 generics) to perform column-wise, grouped and weighted computations on atomic vectors, matrices and data.frames, with special support for dplyr grouped tibbles and data.table's.

(Note: The vector-valued functions and operators fscale/STD, fbetween/B, fHDbetween/HDB, fwithin/W, fHDwithin/HDW, flag/L/F, fdiff/D and fgrowth/G are documented under Data Transformations and Time-Series and Panel-Series. These functions also support plm::pseries and plm::pdata.frame's.)

Value

x aggregated. data.frame column-attributes and overall attributes are preserved.

Usage

## All functions (FUN) follow a common syntax in 4 methods:
FUN(x, ...)
## Default S3 method:
FUN(x, g = NULL, [w = NULL,] TRA = NULL, [na.rm = TRUE,]
    use.g.names = TRUE, ...)
## S3 method for class 'matrix'
FUN(x, g = NULL, [w = NULL,] TRA = NULL, [na.rm = TRUE,]
    use.g.names = TRUE, drop = TRUE, ...)
## S3 method for class 'data.frame'
FUN(x, g = NULL, [w = NULL,] TRA = NULL, [na.rm = TRUE,]
    use.g.names = TRUE, drop = TRUE, ...)
## S3 method for class 'grouped_df'
FUN(x, [w = NULL,] TRA = NULL, [na.rm = TRUE,]
    use.g.names = FALSE, keep.group_vars = TRUE, [keep.w = TRUE,] ...)

Arguments

`x`		a vector, matrix, data.frame or grouped tibble (`dplyr::grouped_df`).
		`g`
	a factor, `GRP` object, atomic vector (internally converted to factor) or a list of vectors / factors (internally converted to a `GRP` object) used to group `x`.
	`w`
a numeric vector of (non-negative) weights, may contain missing values. Supported by `fsum`, `fprod`, `fmean`, `fvar`, `fsd` and `fmode`.
`TRA`		an integer or quoted operator indicating the transformation to perform: 1 - "replace_fill" \| 2 - "replace" \| 3 - "-" \| 4 - "-+" \| 5 - "/" \| 6 - "%" \| 7 - "+" \| 8 - "*" \| 9 - "%%" \| 10 - "-%%". See `TRA`.
		`na.rm`
	logical. Skip missing values in `x`. Defaults to `TRUE` in all functions and implemented at very little computational cost. Not available for `fNobs`.
	`use.g.names`
make group-names and add to the result as names (vector method) or row-names (matrix and data.frame method). No row-names are generated for data.tables and grouped tibbles.
`drop`		matrix and data.frame methods: Drop dimensions and return an atomic vector if `g = NULL` and `TRA = NULL`.
		`keep.group_vars`
	grouped_df method: Logical. `FALSE` removes grouping variables after computation.
	`keep.w`
grouped_df method: Logical. `TRUE` also aggregates weights and saves them in a column, `FALSE` removes weighting variable after computation (if contained in `grouped_df`).
`...`		arguments to be passed to or from other methods, and extra arguments to some functions, i.e. the algorithm used to compute variances etc.

Details

Please see the documentation of individual functions.

Examples

Run this code

# NOT RUN {
## default vector method
mpg <- mtcars$mpg
fsum(mpg)                         # Simple sum
fsum(mpg, TRA = "%")              # Simple transformation: obtain percentages of mpg
fsum(mpg, mtcars$cyl)             # Grouped sum
fmean(mpg, mtcars$cyl)            # Grouped mean
fmean(mpg, w = mtcars$hp)         # Weighted mean, weighted by hp
fmean(mpg, mtcars$cyl, mtcars$hp) # Grouped mean, weighted by hp
fsum(mpg, mtcars$cyl, TRA = "%")  # Percentages by group
fmean(mpg, mtcars$cyl, mtcars$hp, # Replace vector elements with their weighted group-mean
      TRA = "replace")

## data.frame method
fsum(mtcars)
fsum(mtcars, TRA = "%")
fsum(mtcars, mtcars[c(2,8:9)])           # Grouped column sum
g <- GRP(mtcars, ~ cyl + vs + am)        # Here precomputing the groups!
fsum(mtcars, g)                          # Faster !!
fmean(mtcars, g, mtcars$hp)
fmean(mtcars, g, mtcars$hp, "-")         # demeaning by weighted group means... see also ?W

fmode(wlddev, drop = FALSE)              # Compute statistical modes of variables in this data
fmode(wlddev, wlddev$income)             # grouped statistical modes ..

## matrix method
m <- qM(mtcars)
fsum(m)
fsum(m, g) # ...

## method for grouped tibbles - for use with dplyr
library(dplyr)
mtcars %>% group_by(cyl,vs,am) %>% select(mpg,carb) %>% fsum
mtcars %>% group_by(cyl,vs,am) %>% fsum(TRA = "%")
mtcars %>% group_by(cyl,vs,am) %>% fmean(hp)         # weighted grouped mean, save sum of weights
mtcars %>% group_by(cyl,vs,am) %>% fmean(hp, keep.group_vars = FALSE)
# }

Run the code above in your browser using DataLab