This retrieves rows in chunks of page_size
. It is most suitable for results
of smaller queries (<100 MB, say). For larger queries, it is better to
export the results to a CSV file stored on google cloud and use the
bq command line tool to download locally.
bq_table_download(
x,
n_max = Inf,
page_size = NULL,
start_index = 0L,
max_connections = 6L,
quiet = NA,
bigint = c("integer", "integer64", "numeric", "character"),
max_results = deprecated()
)
Because data retrieval may generate list-columns and the data.frame
print method can have problems with list-columns, this method returns
a tibble. If you need a data.frame
, coerce the results with
as.data.frame()
.
A bq_table
Maximum number of results to retrieve. Use Inf
to retrieve all
rows.
The number of rows requested per chunk. It is recommended to
leave this unspecified until you have evidence that the page_size
selected automatically by bq_table_download()
is problematic.
When page_size = NULL
bigrquery determines a conservative, natural chunk
size empirically. If you specify the page_size
, it is important that each
chunk fits on one page, i.e. that the requested row limit is low enough to
prevent the API from paginating based on response size.
Starting row index (zero-based).
Number of maximum simultaneous connections to BigQuery servers.
If FALSE
, displays progress bar; if TRUE
is silent;
if NA
displays progress bar only for long-running jobs.
The R type that BigQuery's 64-bit integer types should be
mapped to. The default is "integer"
, which returns R's integer
type,
but results in NA
for values above/below +/- 2147483647. "integer64"
returns a bit64::integer64, which allows the full range of 64 bit
integers.
bigrquery will retrieve nested and repeated columns in to list-columns as follows:
Repeated values (arrays) will become a list-column of vectors.
Records will become list-columns of named lists.
Repeated records will become list-columns of data frames.
In my timings, this code takes around 1 minute per 100 MB of data. If you need to download considerably more than this, I recommend:
Export a .csv
file to Cloud Storage using bq_table_save()
.
Use the gsutil
command line utility to download it.
Read the csv file into R with readr::read_csv()
or data.table::fread()
.
Unfortunately you can not export nested or repeated formats into CSV, and the formats that BigQuery supports (arvn and ndjson) that allow for nested/repeated values, are not well supported in R.
if (bq_testable()) {
df <- bq_table_download("publicdata.samples.natality", n_max = 35000)
}
Run the code above in your browser using DataLab