by: Apply a Function to a `db.data.frame` Split by column(s)

Description

by is equivalent to "group by" in SQL language. It groups the data according the value(s) of one or multiple columns, and then apply an aggregate function onto each group of the data.

Usage

# S4 method for db.obj
by(data, INDICES, FUN, ..., simplify = TRUE)

Arguments

data

A db.obj object. It represents a table/view in the database if it is an db.data.frame object, or a series of operations applied on an existing db.data.frame object if it is a db.Rquery object.

INDICES

A list of db.Rquery objects. Each of the list element selects one or multiple columns of data. When the value is NULL, no grouping of data is done, and the aggregate function FUN will be applied onto all the data.

FUN

A function, which will be applied onto each group of the data. The result of FUN can be of '>db.obj type or any other data types that R supports.

…

Extra arguments passed to FUN, currently not implemented.

simplify

Not implemented yet.

Value

The type of the returned value depends on the return type of FUN.

If the return type of FUN is a '>db.obj object, then this function returns a db.Rquery object, which is actually the SQL query that does the "GROUP BY". It computes the group-by values. The result can be viewed using lk or lookat.

If the return type of FUN is not a '>db.obj object, then this function returns a list, which contains a number of sub-lists. Each sub-list contains two items: (1) index, an array of strings, a set of distinct values of the INDICES converted to string; and (2) result, the result produced by FUN applying onto the group of data that has the set of distinct values. The total number of sub-lists is equal to the total number of groups of data partitioned by INDICES.

Examples

Run this code

# NOT RUN {
## help("by,db.obj-method") # display this doc
<!-- %% @test .port Database port number -->
<!-- %% @test .dbname Database name -->
## set up the database connection
## Assume that .port is port number and .dbname is the database name
cid <- db.connect(port = .port, dbname = .dbname, verbose = FALSE)

## create a table from the example data.frame "abalone"
x <- as.db.data.frame(abalone, conn.id = cid, verbose = FALSE)

## mean values for each column
lk(by(x, x$sex, mean))

## No need to compute the mean of id and sex
lk(by(x[,-c(1,2)], x$sex, mean))
lk(by(x[,-c(1,2)], x[,2], mean)) # the same
lk(by(x[,-c(1,2)], x[,"sex"], mean)) # the same

## The return type of FUN is not db.obj
dat <- x

## Fit linear model to each group of data
by(dat, dat$sex, function(x) madlib.lm(rings ~ . - id - sex, data = x))

db.disconnect(cid, verbose = FALSE)
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples