Learn R Programming

⚠️There's a newer version (1.4.7) of this package.Take me there.

RPresto

RPresto is a DBI-based adapter for the open source distributed SQL query engine Presto for running interactive analytic queries.

Installation

RPresto is both on CRAN and github. For the CRAN version, you can use

install.packages('RPresto')

You can install the github development version via

devtools::install_github('prestodb/RPresto')

Examples

The standard DBI approach works with RPresto:

library('DBI')

con <- dbConnect(
  RPresto::Presto(),
  host='http://localhost',
  port=7777,
  user=Sys.getenv('USER'),
  schema='<schema>',
  catalog='<catalog>'
)

res <- dbSendQuery(con, 'SELECT 1')
# dbFetch without arguments only returns the current chunk, so we need to
# loop until the query completes.
while (!dbHasCompleted(res)) {
    chunk <- dbFetch(res)
    print(chunk)
}

res <- dbSendQuery(con, 'SELECT CAST(NULL AS VARCHAR)')
# Due to the unpredictability of chunk sizes with presto, we do not support
# custom number of rows
# testthat::expect_error(dbFetch(res, 5))

# To get all rows using dbFetch, pass in a -1 argument
print(dbFetch(res, -1))

# An alternative is to use dbGetQuery directly

# `source` for iris.sql()
source(system.file('tests', 'testthat', 'utilities.R', package='RPresto'))

iris <- dbGetQuery(con, paste("SELECT * FROM", iris.sql()))

dbDisconnect(con)

We also include dplyr integration.

library(dplyr)

db <- src_presto(
  host='http://localhost',
  port=7777,
  user=Sys.getenv('USER'),
  schema='<schema>',
  catalog='<catalog>'
)

# Assuming you have a table like iris in the database
iris <- tbl(db, 'iris')

iris %>%
  group_by(species) %>%
  summarise(mean_sepal_length = mean(as(sepal_length, 0.0))) %>%
  arrange(species) %>%
  collect()

How RPresto works

Presto exposes its interface via a REST based API1. We utilize the httr package to make the API calls and use jsonlite to reshape the data into a data.frame. Note that as of now, only read operations are supported.

RPresto has been tested on Presto 0.100.

License

RPresto is BSD-licensed. We also provide an additional patent grant.

[1] See https://gist.github.com/electrum/7710544 for an unofficial description of the API.

Copy Link

Version

Install

install.packages('RPresto')

Monthly Downloads

1,347

Version

1.2.1

License

BSD_3_clause + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Onur Ismail Filiz

Last Published

April 6th, 2016

Functions in RPresto (1.2.1)

dbGetInfo,PrestoDriver-method

Metadata about database objects
PrestoResult-class

An S4 class to represent a Presto Result
.json.tabular.to.data.frame

Convert a data.frame formatted in the list of lists style as returned by Presto to an actual data.frame
copy_to.src_presto

dbDataType,PrestoDriver-method

Return the corresponding presto data type for the given R object
PrestoCursor-class

Internal implementation detail class needed for its side-effects. When dbFetch is called, we need to both return the data and update the uri to the next value.
PrestoDriver-class

An S4 class to represent a Presto Driver (and methods) It is used purely for dispatch and dbUnloadDriver is unnecessary
PrestoConnection-class

S4 implementation of DBIConnection for Presto.
src_presto

dplyr integration to connect to a Presto database.
RPresto

RPresto
Presto

Connect to a Presto database