Learn R Programming

collapse (version 1.1.0)

fmedian: Fast (Grouped) Median Value for Matrix-Like Objects

Description

fmedian is a generic function that computes the (column-wise) median value of all values in x, (optionally) grouped by g. The TRA argument can further be used to transform x using its (grouped) median value.

Usage

fmedian(x, ...)

# S3 method for default fmedian(x, g = NULL, TRA = NULL, na.rm = TRUE, use.g.names = TRUE, ...)

# S3 method for matrix fmedian(x, g = NULL, TRA = NULL, na.rm = TRUE, use.g.names = TRUE, drop = TRUE, ...)

# S3 method for data.frame fmedian(x, g = NULL, TRA = NULL, na.rm = TRUE, use.g.names = TRUE, drop = TRUE, ...)

# S3 method for grouped_df fmedian(x, TRA = NULL, na.rm = TRUE, use.g.names = FALSE, keep.group_vars = TRUE, ...)

Arguments

x

a numeric vector, matrix, data.frame or grouped tibble (dplyr::grouped_df).

g

a factor, GRP object, atomic vector (internally converted to factor) or a list of vectors / factors (internally converted to a GRP object) used to group x.

TRA

an integer or quoted operator indicating the transformation to perform: 1 - "replace_fill" | 2 - "replace" | 3 - "-" | 4 - "-+" | 5 - "/" | 6 - "%" | 7 - "+" | 8 - "*" | 9 - "%%" | 10 - "-%%". See TRA.

na.rm

logical. Skip missing values in x. Defaults to TRUE and implemented at very little computational cost. If na.rm = FALSE a NA is returned when encountered.

use.g.names

make group-names and add to the result as names (vector method) or row-names (matrix and data.frame method). No row-names are generated for data.tables and grouped tibbles.

drop

matrix and data.frame method: drop dimensions and return an atomic vector if g = NULL and TRA = NULL.

keep.group_vars

grouped_df method: Logical. FALSE removes grouping variables after computation.

...

arguments to be passed to or from other methods.

Value

The median value of x, grouped by g, or (if TRA is used) x transformed by its median value, grouped by g.

Details

Median value estimation is done using std::nth_element in C++, which is an efficient partial sorting algorithm. A downside of this is that vectors need to be copied first and then partially sorted, thus fmedian currently requires additional memory equal to the size of the object (x).

Grouped computations are currently performed by mapping the data to a sparse-array directed by g and then partially sorting each row (group) of that array. For reasons I don't fully understand this requires less memory than a full deep copy which is done with no groups.

When applied to data frame's with groups or drop = FALSE, fmedian preserves all column attributes (such as variable labels) but does not distinguish between classed and unclassed objects. The attributes of the data.frame itself are also preserved.

See Also

fmean, fmode, Fast Statistical Functions, Collapse Overview

Examples

Run this code
# NOT RUN {
## default vector method
mpg <- mtcars$mpg
fmedian(mpg)                         # Simple median value
fmedian(mpg, TRA = "-")              # Simple transformation: Subtract median value
fmedian(mpg, mtcars$cyl)             # Grouped median value
fmedian(mpg, mtcars[c(2,8:9)])       # More groups...
g <- GRP(mtcars, ~ cyl + vs + am)    # Precomputing groups gives more speed !!
fmedian(mpg, g)
fmedian(mpg, g, TRA = "-")           # Groupwise subtract median value

## data.frame method
fmedian(mtcars)
fmedian(mtcars, TRA = "-")
fmedian(mtcars, g)
fmedian(mtcars, g, use.g.names = FALSE) # No row-names generated

## matrix method
m <- qM(mtcars)
fmedian(m)
fmedian(m, TRA = "-")
fmedian(m, g) # etc...

## method for grouped tibbles - for use with dplyr
library(dplyr)
mtcars %>% group_by(cyl,vs,am) %>% fmedian
mtcars %>% group_by(cyl,vs,am) %>% fmedian("-")
mtcars %>% group_by(cyl,vs,am) %>% select(mpg) %>% fmedian
# }

Run the code above in your browser using DataLab