read.odbc.ffdf: Read data from a ODBC connection into an ffdf.

Description

Read data from a ODBC connection into an ffdf. This can for example be used to import large datasets from Oracle, SQLite, MySQL, PostgreSQL, Hive or other SQL databases into R.

Usage

read.odbc.ffdf(
  query = NULL,
  odbcConnect.args = list(dsn = NULL, uid = "", pwd = ""),
  odbcDriverConnect.args = list(connection = ""),
  odbcQuery.args = list(),
  sqlGetResults.args = list(),
  x = NULL,
  nrows = -1,
  first.rows = NULL,
  next.rows = NULL,
  levels = NULL,
  appendLevels = TRUE,
  asffdf_args = list(),
  BATCHBYTES = getOption("ffbatchbytes"),
  VERBOSE = FALSE,
  colClasses = NULL,
  transFUN = NULL,
  ...
)

Arguments

query

the SQL query to execute on the ODBC connection

odbcConnect.args

a list of arguments to pass to ODBC's odbcConnect (like dsn, uid, pwd). See the examples.

odbcDriverConnect.args

a list of arguments to pass to ODBC's odbcDriverConnect (like connection). If you want to connect using odbcDriverConnect instead of odbcConnect.

odbcQuery.args

a list of arguments to pass to ODBC's odbcQuery, like rows_at_time. Defaults to an empty list.

sqlGetResults.args

a list containing optional parameters which will be passed to sqlGetResults. Defaults to an empty list. The max parameter will be overwritten with first.rows and next.rows when importing in batches.

NULL or an optional ffdf object to which the read records are appended. See documentation in read.table.ffdf for more details and the example below.

nrows

Number of rows to read from the query resultset. Default value of -1 reads in all rows.

first.rows

chunk size (rows) to read for first chunk from the query resultset

next.rows

chunk size (rows) to read sequentially for subsequent chunks from the query resultset. Currently, this must be specified.

levels

optional specification of factor levels. A list with as names the names the columns of the data.frame fetched in the first.rows, containing levels of the factors.

appendLevels

logical. A vector of permissions to expand levels for factor columns. See documentation in read.table.ffdf for more details.

asffdf_args

further arguments passed to as.ffdf (ignored if 'x' gives an ffdf object)

BATCHBYTES

integer: bytes allowed for the size of the data.frame storing the result of reading one chunk. See documentation in read.table.ffdf for more details.

VERBOSE

logical: TRUE to verbose timings for each processed chunk (default FALSE).

colClasses

See documentation in read.table.ffdf

transFUN

function applied to the data frame after each chunk is retreived by sqlGetResults

...

optional parameters passed on to transFUN

Value

An ffdf object unless the query returns zero records in which case the function will return the data.frame returned by sqlGetResults and possibly transFUN.

Details

Opens up the ODBC connection using RODBC::odbcConnect or RODBC::odbcDriverConnect, sends the query using RODBC::odbcQuery and retrieves the results in batches of next.rows rows using RODBC::sqlGetResults. Heavily borrowed from read.table.ffdf

Examples

Run this code

# NOT RUN {
##
## Using the sqlite database (smalldb.sqlite3) in the /inst folder of the package
## set up the sqlite ODBC driver (www.stats.ox.ac.uk/pub/bdr/RODBC-manual.pd) 
## and call it 'smalltestsqlitedb' 
##
# }
# NOT RUN {
require(RODBC)
x <- read.odbc.ffdf(
query = "select * from testdata limit 10000",
odbcConnect.args = list(
 dsn="smalltestsqlitedb", uid = "", pwd = "", 
 believeNRows = FALSE, rows_at_time = 1), 
nrows = -1, 
first.rows = 100, next.rows = 1000, VERBOSE = TRUE)
# }