Learn R Programming

RMySQL (version 0.10.2)

dbApply: Apply R/S-Plus functions to remote groups of DBMS rows (experimental)

Description

Applies R/S-Plus functions to groups of remote DBMS rows without bringing an entire result set all at once. The result set is expected to be sorted by the grouping field.

The MySQL implementation allows us to register R functions that get invoked when certain fetching events occur. These include the ``begin'' event (no records have been yet fetched), ``begin.group'' (the record just fetched belongs to a new group), ``new record'' (every fetched record generates this event), ``group.end'' (the record just fetched was the last row of the current group), ``end'' (the very last record from the result set). Awk and perl programmers will find this paradigm very familiar (although SAP's ABAP language is closer to what we're doing).

Usage

dbApply(res, ...)

## S3 method for class 'MySQLResult': dbApply(res, INDEX, FUN = stop("must specify FUN"), begin = NULL, group.begin = NULL, new.record = NULL, end = NULL, batchSize = 100, maxBatch = 1e+06, ..., simplify = TRUE)

Arguments

res
a result set (see dbSendQuery).
...
any additional arguments to be passed to FUN.
INDEX
a character or integer specifying the field name or field number that defines the various groups.
FUN
a function to be invoked upon identifying the last row from every group. This function will be passed a data frame holding the records of the current group, a character string with the group label, plus any other arguments passed to dbApply a
begin
a function of no arguments to be invoked just prior to retrieve the first row from the result set.
group.begin
a function of one argument (the group label) to be invoked upon identifying a row from a new group
new.record
a function to be invoked as each individual record is fetched. The first argument to this function is a one-row data.frame holding the new record.
end
a function of no arguments to be invoked just after retrieving the last row from the result set.
batchSize
the default number of rows to bring from the remote result set. If needed, this is automatically extended to hold groups bigger than batchSize.
maxBatch
the absolute maximum of rows per group that may be extracted from the result set.
simplify
Not yet implemented

Value

  • A list with as many elements as there were groups in the result set.

Details

This function is meant to handle somewhat gracefully(?) large amounts of data from the DBMS by bringing into R manageable chunks (about batchSize records at a time, but not more than maxBatch); the idea is that the data from individual groups can be handled by R, but not all the groups at the same time.

Examples

Run this code
if (mysqlHasDefault()) {
con <- dbConnect(RMySQL::MySQL(), dbname = "test")

dbWriteTable(con, "mtcars", mtcars, overwrite = TRUE)
res <- dbSendQuery(con, "SELECT * FROM mtcars ORDER BY cyl")
dbApply(res, "cyl", function(x, grp) quantile(x$mpg, names=FALSE))

dbClearResult(res)
dbRemoveTable(con, "mtcars")
dbDisconnect(con)
}

Run the code above in your browser using DataLab