bqr_upload_data: Upload data to BigQuery

Description

Upload data to BigQuery

Usage

bqr_upload_data(projectId = bqr_get_global_project(),
  datasetId = bqr_get_global_dataset(), tableId, upload_data,
  create = c("CREATE_IF_NEEDED", "CREATE_NEVER"),
  writeDisposition = c("WRITE_TRUNCATE", "WRITE_APPEND", "WRITE_EMPTY"),
  schema = NULL, sourceFormat = c("CSV", "DATASTORE_BACKUP",
  "NEWLINE_DELIMITED_JSON", "AVRO"), wait = TRUE, autodetect = FALSE,
  nullMarker = NULL, maxBadRecords = NULL, allowJaggedRows = FALSE,
  allowQuotedNewlines = FALSE, fieldDelimiter = NULL)

Arguments

projectId

The BigQuery project ID.

datasetId

A datasetId within projectId.

tableId

ID of table where data will end up.

upload_data

The data to upload, a data.frame object or a Google Cloud Storage URI

create

Whether to create a new table if necessary, or error if it already exists.

writeDisposition

How to add the data to a table.

schema

If upload_data is a Google Cloud Storage URI, supply the data schema. For CSV a helper function is available by using schema_fields on a data sample

sourceFormat

If upload_data is a Google Cloud Storage URI, supply the data format. Default is CSV

wait

If uploading a data.frame, whether to wait for it to upload before returning

autodetect

Experimental feature that auto-detects schema for CSV and JSON files

nullMarker

Specifies a string that represents a null value in a CSV file. For example, if you specify \N, BigQuery interprets \N as a null value when loading a CSV file. The default value is the empty string.

maxBadRecords

The maximum number of bad records that BigQuery can ignore when running the job

allowJaggedRows

Whether to allow rows with variable length columns

allowQuotedNewlines

Whether to allow datasets with quoted new lines

fieldDelimiter

The separator for fields in a CSV file. Default is comma - ,

Value

TRUE if successful, FALSE if not.

Details

A temporary csv file is created when uploading from a local data.frame

For larger file sizes up to 5TB, upload to Google Cloud Storage first via gcs_upload then supply the object URI of the form gs://project-name/object-name to the upload_data argument.

You also need to supply a data schema. Remember that the file should not have a header row.

Examples

Run this code

# NOT RUN {
# }
# NOT RUN {
 library(googleCloudStorageR)
 library(bigQueryR)
 
 gcs_global_bucket("your-project")
 
 ## custom upload function to ignore quotes and column headers
 f <- function(input, output) {
   write.table(input, sep = ",", col.names = FALSE, row.names = FALSE, 
               quote = FALSE, file = output, qmethod = "double")}
   
 ## upload files to Google Cloud Storage
 gcs_upload(mtcars, name = "mtcars_test1.csv", object_function = f)
 gcs_upload(mtcars, name = "mtcars_test2.csv", object_function = f)
 
 ## create the schema of the files you just uploaded
 user_schema <- schema_fields(mtcars)
 
 ## load files from Google Cloud Storage into BigQuery
 bqr_upload_data(projectId = "your-project", 
                datasetId = "test", 
                tableId = "from_gcs_mtcars", 
                upload_data = c("gs://your-project/mtcars_test1.csv", 
                                "gs://your-project/mtcars_test2.csv"),
                schema = user_schema)
 
 ## for big files, its helpful to create your schema on a small sample
 ## a quick way to do this on the command line is:
 # "head bigfile.csv > head_bigfile.csv"

## upload nested lists as JSON
the_list <- list(list(col1 = "yes", col2 = "no", 
                      col3 = list(nest1 = 1, nest2 = 3), col4 = "oh"),
                 list(col1 = "yes2", 
                      col2 = "n2o", col3 = list(nest1 = 5, nest2 = 7), 
                      col4 = "oh2"), 
                 list(col1 = "yes3", col2 = "no3", 
                      col3 = list(nest1 = 7, nest2 = 55), col4 = "oh3"))
   
bqr_upload_data(datasetId = "test", 
                tableId = "nested_list_json", 
                upload_data = the_list, 
                autodetect = TRUE)

# }
# NOT RUN {
# }