Learn R Programming

bigQueryR (version 0.5.0)

bqr_extract_data: Extract data asynchronously

Description

Use this instead of bqr_query for big datasets. Requires you to make a bucket at https://console.cloud.google.com/storage/browser

Usage

bqr_extract_data(projectId = bqr_get_global_project(),
  datasetId = bqr_get_global_dataset(), tableId, cloudStorageBucket,
  filename = paste0("big-query-extract-", gsub(" |:|-", "", Sys.time()),
  "-*.csv"), compression = c("NONE", "GZIP"),
  destinationFormat = c("CSV", "NEWLINE_DELIMITED_JSON", "AVRO"),
  fieldDelimiter = ",", printHeader = TRUE)

Arguments

projectId

The BigQuery project ID.

datasetId

A datasetId within projectId.

tableId

ID of table you wish to extract.

cloudStorageBucket

URI of the bucket to extract into.

filename

Include a wildcard (*) if extract expected to be > 1GB.

compression

Compression of file.

destinationFormat

Format of file.

fieldDelimiter

fieldDelimiter of file.

printHeader

Whether to include header row.

Value

A Job object to be queried via bqr_get_job

See Also

https://cloud.google.com/bigquery/exporting-data-from-bigquery

Other BigQuery asynch query functions: bqr_download_extract, bqr_get_job, bqr_grant_extract_access, bqr_query_asynch, bqr_wait_for_job

Examples

Run this code
# NOT RUN {
# }
# NOT RUN {
library(bigQueryR)

## Auth with a project that has at least BigQuery and Google Cloud Storage scope
bqr_auth()

## make a big query
job <- bqr_query_asynch("your_project", 
                        "your_dataset",
                        "SELECT * FROM blah LIMIT 9999999", 
                        destinationTableId = "bigResultTable")
                        
## poll the job to check its status
## its done when job$status$state == "DONE"
bqr_get_job("your_project", job)

##once done, the query results are in "bigResultTable"
## extract that table to GoogleCloudStorage:
# Create a bucket at Google Cloud Storage at 
# https://console.cloud.google.com/storage/browser

job_extract <- bqr_extract_data("your_project",
                                "your_dataset",
                                "bigResultTable",
                                "your_cloud_storage_bucket_name")
                                
## poll the extract job to check its status
## its done when job$status$state == "DONE"
bqr_get_job("your_project", job_extract$jobReference$jobId)

You should also see the extract in the Google Cloud Storage bucket
googleCloudStorageR::gcs_list_objects("your_cloud_storage_bucket_name")

## to download via a URL and not logging in via Google Cloud Storage interface:
## Use an email that is Google account enabled
## Requires scopes:
##  https://www.googleapis.com/auth/devstorage.full_control
##  https://www.googleapis.com/auth/cloud-platform

download_url <- bqr_grant_extract_access(job_extract, "your@email.com")

## download_url may be multiple if the data is > 1GB

# }
# NOT RUN {
# }

Run the code above in your browser using DataLab