Learn R Programming

collapse (version 1.1.0)

dapply: Data Apply

Description

dapply efficiently applies functions to columns or rows of matrices and data frame's and (default) returns an object of the same type and with the same attributes, or converts to the other type. A simple parallelism is also available.

Usage

dapply(X, FUN, ..., MARGIN = 2, parallel = FALSE, mc.cores = 1L,
       return = c("same","matrix","data.frame"), drop = TRUE)

Arguments

X

a matrix or data frame.

FUN

a function, can be scalar- or vector-valued.

...

further arguments to FUN.

MARGIN

integer. The margin which FUN will be applied over. Default 2 indicates columns while 1 indicates rows. See also Details.

parallel

logical. TRUE implements simple parallel execution by internally calling parallel::mclapply instead of base::lapply.

mc.cores

integer. Argument to parallel::mclapply indicating the number of cores to use for parallel execution. Can use parallel::detectCores() to select all available cores. See also ?parallel::mclapply.

return

an integer or string indicating the type of object to return. The default 1 - "same" returns the same object type (i.e. passing a matrix returns a matrix and passing a data frame returns a data frame). 2 - "matrix" always returns the output as matrix and 3 - "data.frame" always returns a data frame.

drop

logical. If the result has only one row or one column, drop = TRUE will drop dimensions and return a (named) atomic vector.

Value

X where FUN was applied to every row or column.

Details

dapply is an efficient command to apply functions to rows or columns of data without loosing information (attributes) about the data or changing the classes or format of the data. It is principally an efficient wrapper around base::lapply and works as follows:

  • Save the attributes of X.

  • If MARGIN = 2 (columns), convert matrices to plain lists of columns using mctl and remove all attributes from data frames.

  • If MARGIN = 1 (rows), convert matrices to plain lists of rows using mrtl. For data frames remove all attributes, efficiently convert to matrix using do.call(rbind, X) and also convert to list of rows using mrtl.

  • Call base::lapply or parallel::mclapply on these plain lists (which is faster than calling lapply on an object with attributes).

  • depending on the requested output type, use base::matrix, base::unlist or do.call(cbind, ...) to convert the result back to a matrix or list of columns.

  • modify the relevant attributes accordingly and efficiently attach to the object again (no further checks).

This performance gain from working with plain lists makes dapply not much slower than calling lapply itself on a data frame. Because of the conversions involved, row-operations require some memory, but are still faster than base::apply.

See Also

BY, collap, Fast Statistical Functions, Data Transformations, Collapse Overview

Examples

Run this code
# NOT RUN {
dapply(mtcars, log)                      # Take natural log of each variable
dapply(mtcars, log, return = "matrix")   # Return as matrix
m <- as.matrix(mtcars)
dapply(m, log)                           # Same thing
dapply(m, log, return = "data.frame")    # Return data frame from matrix
dapply(mtcars, sum); dapply(m, sum)      # Computing sum of each column, return as vector
dapply(mtcars, sum, drop = FALSE)        # This returns a data.frame of 1 row
dapply(mtcars, sum, MARGIN = 1)          # Compute row-sum of each column, return as vector
dapply(m, sum, MARGIN = 1)               # Same thing for matrices, faster than apply(m, 1, sum)
dapply(m, sum, MARGIN = 1, drop = FALSE) # Gives matrix with one column
dapply(m, quantile, MARGIN = 1)          # Compute row-quantiles
dapply(m, quantile)                      # Column-quantiles
dapply(mtcars, quantile, MARGIN = 1)     # Same for data frames, output is also a data.frame
dapply(mtcars, quantile)

# Let's now take a more complex classed object, like a dplyr grouped tibble
library(dplyr)
gmtcars <- group_by(mtcars,cyl,vs,am)
dapply(gmtcars, log)                     # Still gives a grouped tibble back
dapply(gmtcars, log, MARGIN = 1)
dapply(gmtcars, quantile, MARGIN = 1)    # Also works for quantiles
dapply(gmtcars, log, return = "matrix")  # Output as matrix
# }

Run the code above in your browser using DataLab