Learn R Programming

ETLUtils (version 1.5)

read.dbi.ffdf: Read data from a DBI connection into an ffdf.

Description

Read data from a DBI connection into an ffdf. This can for example be used to import large datasets from Oracle, SQLite, MySQL, PostgreSQL, Hive or other SQL databases into R.

Usage

read.dbi.ffdf(
  query = NULL,
  dbConnect.args = list(drv = NULL, dbname = NULL, username = "", password = ""),
  dbSendQuery.args = list(),
  dbFetch.args = list(),
  x = NULL,
  nrows = -1,
  first.rows = NULL,
  next.rows = NULL,
  levels = NULL,
  appendLevels = TRUE,
  asffdf_args = list(),
  BATCHBYTES = getOption("ffbatchbytes"),
  VERBOSE = FALSE,
  colClasses = NULL,
  transFUN = NULL,
  ...
)

Arguments

query

the SQL query to execute on the DBI connection

dbConnect.args

a list of arguments to pass to DBI's dbConnect (like drv, dbname, username, password). See the examples.

dbSendQuery.args

a list containing database-specific parameters which will be passed to to pass to dbSendQuery. Defaults to an empty list.

dbFetch.args

a list containing optional database-specific parameters which will be passed to to pass to dbFetch. Defaults to an empty list.

x

NULL or an optional ffdf object to which the read records are appended. See documentation in read.table.ffdf for more details and the example below.

nrows

Number of rows to read from the query resultset. Default value of -1 reads in all rows.

first.rows

chunk size (rows) to read for first chunk from the query resultset

next.rows

chunk size (rows) to read sequentially for subsequent chunks from the query resultset. Currently, this must be specified.

levels

optional specification of factor levels. A list with as names the names the columns of the data.frame fetched in the first.rows, containing levels of the factors.

appendLevels

logical. A vector of permissions to expand levels for factor columns. See documentation in read.table.ffdf for more details.

asffdf_args

further arguments passed to as.ffdf (ignored if 'x' gives an ffdf object )

BATCHBYTES

integer: bytes allowed for the size of the data.frame storing the result of reading one chunk. See documentation in read.table.ffdf for more details.

VERBOSE

logical: TRUE to verbose timings for each processed chunk (default FALSE).

colClasses

See documentation in read.table.ffdf

transFUN

function applied to the data frame after each chunk is retreived by dbFetch

...

optional parameters passed on to transFUN

Value

An ffdf object unless the query returns zero records in which case the function will return the data.frame returned by dbFetch and possibly transFUN.

Details

Opens up the DBI connection using DBI::dbConnect, sends the query using DBI::dbSendQuery and DBI::dbFetch-es the results in batches of next.rows rows. Heavily borrowed from read.table.ffdf

See Also

read.table.ffdf, read.odbc.ffdf

Examples

Run this code
# NOT RUN {
require(ff)

##
## Example query using data in sqlite
##
require(RSQLite)
dbfile <- system.file("smalldb.sqlite3", package="ETLUtils")
drv <- dbDriver("SQLite")
query <- "select * from testdata limit 10000"
x <- read.dbi.ffdf(query = query, dbConnect.args = list(drv = drv, dbname = dbfile), 
first.rows = 100, next.rows = 1000, VERBOSE=TRUE)
class(x)
x[1:10, ]

## show it is the same as getting the data directly using RSQLite 
## apart from characters which are factors in ffdf objects
directly <- dbGetQuery(dbConnect(drv = drv, dbname = dbfile), query)
directly <- as.data.frame(as.list(directly), stringsAsFactors=TRUE)
all.equal(x[,], directly)

## show how to use the transFUN argument to transform the data before saving into the ffdf
## and shows the use of the levels argument
query <- "select * from testdata limit 10"
x <- read.dbi.ffdf(query = query, dbConnect.args = list(drv = drv, dbname = dbfile), 
first.rows = 100, next.rows = 1000, VERBOSE=TRUE, levels = list(a = rev(LETTERS)),
transFUN = function(x, subtractdays){
	x$b <- as.Date(x$b)
	x$b.subtractdaysago <- x$b - subtractdays
	x
}, subtractdays=7)
class(x)
x[1:10, ]
## remark that the levels of column a are reversed due to specifying the levels argument correctly
levels(x$a)

## show how to append data to an existing ffdf object 
transformexample <- function(x, subtractdays){
	x$b <- as.Date(x$b)
	x$b.subtractdaysago <- x$b - subtractdays
	x
}
dim(x)
x[,]
combined <- read.dbi.ffdf(query = query, 
 dbConnect.args = list(drv = drv, dbname = dbfile), 
 first.rows = 100, next.rows = 1000, x = x, VERBOSE=TRUE, 
 transFUN = transformexample, subtractdays=1000)
dim(combined)
combined[,]

##
## Example query using ROracle. Do try this at home with some larger data :)
##
# }
# NOT RUN {
require(ROracle)
query <- "select OWNER, TABLE_NAME, TABLESPACE_NAME, NUM_ROWS, LAST_ANALYZED from all_all_tables" 
x <- read.dbi.ffdf(query=query,
dbConnect.args = list(drv = dbDriver("Oracle"), 
user = "YourUser", password = "YourPassword", dbname = "Mydatabase"),
first.rows = 100, next.rows = 50000, nrows = -1, VERBOSE=TRUE)
# }

Run the code above in your browser using DataLab