Learn R Programming

ETLUtils (version 1.5)

read.jdbc.ffdf: Read data from a JDBC connection into an ffdf.

Description

Read data from a JDBC connection into an ffdf. This can for example be used to import large datasets from Oracle, SQLite, MySQL, PostgreSQL, Hive or other SQL databases into R.

Usage

read.jdbc.ffdf(
  query = NULL,
  dbConnect.args = list(drv = NULL, dbname = NULL, username = "", password = ""),
  dbSendQuery.args = list(),
  dbFetch.args = list(),
  x = NULL,
  nrows = -1,
  first.rows = NULL,
  next.rows = NULL,
  levels = NULL,
  appendLevels = TRUE,
  asffdf_args = list(),
  BATCHBYTES = getOption("ffbatchbytes"),
  VERBOSE = FALSE,
  colClasses = NULL,
  transFUN = NULL,
  ...
)

Arguments

query

the SQL query to execute on the JDBC connection

dbConnect.args

a list of arguments to pass to JDBC's RJDBC::dbConnect (like drv, dbname, username, password). See the examples.

dbSendQuery.args

a list containing database-specific parameters which will be passed to to pass to RJDBC::dbSendQuery. Defaults to an empty list.

dbFetch.args

a list containing optional database-specific parameters which will be passed to to pass to RJDBC::dbFetch. Defaults to an empty list.

x

NULL or an optional ffdf object to which the read records are appended. See documentation in read.table.ffdf for more details and the example below.

nrows

Number of rows to read from the query resultset. Default value of -1 reads in all rows.

first.rows

chunk size (rows) to read for first chunk from the query resultset

next.rows

chunk size (rows) to read sequentially for subsequent chunks from the query resultset. Currently, this must be specified.

levels

optional specification of factor levels. A list with as names the names the columns of the data.frame fetched in the first.rows, containing levels of the factors.

appendLevels

logical. A vector of permissions to expand levels for factor columns. See documentation in read.table.ffdf for more details.

asffdf_args

further arguments passed to as.ffdf (ignored if 'x' gives an ffdf object )

BATCHBYTES

integer: bytes allowed for the size of the data.frame storing the result of reading one chunk. See documentation in read.table.ffdf for more details.

VERBOSE

logical: TRUE to verbose timings for each processed chunk (default FALSE).

colClasses

See documentation in read.table.ffdf

transFUN

function applied to the data frame after each chunk is retreived by RJDBC::dbFetch

...

optional parameters passed on to transFUN

Value

An ffdf object unless the query returns zero records in which case the function will return the data.frame returned by RJDBC::dbFetch and possibly transFUN.

Details

Opens up the JDBC connection using RJDBC::dbConnect, sends the query using RJDBC::dbSendQuery and RJDBC::dbFetch-es the results in batches of next.rows rows. Heavily borrowed from read.table.ffdf

See Also

read.table.ffdf, read.jdbc.ffdf

Examples

Run this code
# NOT RUN {
require(ff)

##
## Example query using data in sqlite
##
require(RSQLite)
dbfile <- system.file("smalldb.sqlite3", package="ETLUtils")
drv <- JDBC(driverClass = "org.sqlite.JDBC", classPath = "/usr/local/lib/sqlite-jdbc-3.7.2.jar")
query <- "select * from testdata limit 10000"
x <- read.jdbc.ffdf(query = query, 
 dbConnect.args = list(drv = drv, url = sprintf("jdbc:sqlite:%s", dbfile)), 
 first.rows = 100, next.rows = 1000, VERBOSE=TRUE)
class(x)
x[1:10, ]
# }

Run the code above in your browser using DataLab