Question

I'm at my wits end here so any pointers much appreciated.

I'm querying the Google Analytics API, converting the response to appropriate JSON format and loading it into bigQuery using using multipart requests using Urlfetchapp. But this leads to me hitting the Urlfetchapp 100MB quota per day very quickly so I'm looking at ways to compress the JSON to GZIP and load that into bigQuery (I considered Google Cloud Storage but I'd have the same problem as saving the data to GCS first requires Urlfetchapp as well so that's why this is a Google Apps Scripts issue).

I've converted the data to blob, then zipped it using Utilities.zip and sent the bytes but after much debugging it turns out that the format is .zip, not .gzip..

Here is the json string created in my Apps Script (NEWLINE_DELIMITED_JSON)

{"ga_accountname":"photome","ga_querycode":"493h3v63078","ga_startdate":"2013-10-23 00:00:00","ga_enddate":"2013-10-23 00:00:00","ga_segmentname":"#_all_visits","ga_segmentexp":"ga:hostname=~dd.com","ga_landingPagePath":"/","ga_pagePath":"/","ga_secondPagePath":"(not set)","ga_source":"(direct)","ga_city":"Boden","ga_keyword":"(not set)","ga_country":"Sweden","ga_pageviews":"1","ga_bounces":"0","ga_visits":"1"}

I've got the rest of the API requests worked out (using uploadType resumable, job configuration sending okay, zipped blob bytes getting uploaded okay but bigQuery says "Input contained no data". Here are my Urlfetchapp parameters.

        // Sending job configuration first
        var url = 'https://www.googleapis.com/upload/bigquery/v2/projects/' + bqProjectId +'/jobs?uploadType=resumable';
        var options = {
          'contentType': 'application/json; charset=UTF-8',
          'contentLength': newJobSize,
          'headers': {
            'Accept-Encoding': 'gzip, deflate',
            'Accept': 'application/json',
            'X-Upload-Content-Length': zipSize,
            'X-Upload-Content-Type': 'application/octet-stream'
          },
          'method' : 'post',
          'payload' : jobData,
          'oAuthServiceName' : 'bigQuery',
          'oAuthUseToken'  : 'always'
        };

        // Sending job data
        var url = jobReq.getHeaders().Location;

        var options = {
          'contentType': 'application/octet-stream',
          'contentLength': zipSize,
          'contentRange': '0-'+zipSize,
          'method' : 'put',
          'payload' : zipBytes,
          'oAuthServiceName' : 'bigQuery',
          'oAuthUseToken'  : 'always'
        };

What options have I got? I'm fairly new to APIs but can I get Urlfetchapp to compress the payload to GZIP for me?

No correct solution

OTHER TIPS

There isn't any way to work with gzip in Google Apps Scripts right now - the UtilitiesApp.zip() method only uses regular zip compression, not gzip.

Rather than use the UrlFetchApp to form multipart uploads, why not use the BigQuery library that is present in Google Apps Scripts?

var projectId = "Bigquery-Project-Id";    
var job = {
        configuration: {
          load: {
            destinationTable: {
              projectId: projectId,
              datasetId: datasetId,
              tableId: tableId
            },
            sourceFormat: "NEWLINE_DELIMITED_JSON",
            writeDisposition: "WRITE_APPEND"
          }
        }
      };
var data = jobData;
job = BigQuery.Jobs.insert(job, projectId, data);

To enable it, you will need to turn BigQuery access on in two places.

First, you need to go to the Resources drop down menu in the Apps UI, and then select Advanced Google Services... . Find Big Query on the list, and toggle the On/Off switch for it.

Before you close the advanced services window, you will need to click on the Google's Developer Console link at the bottom. This will open the developers console for your Google Apps Script project. Find Big Query on the APIs list in the console and enable it.

That's it - from there you can can pass data to the BigQuery API using the BigQuery Apps class, rather than UrlFetchApp.

2020 status

For those viewing the question in 2020, the support for gzip has been added for a while now and is available under Utilities service method gzip() and its corresponding override.

GCF option

The other alternative to using the BigQuery advanced service is to change from UrlFetchApp and Google Apps Script project to cloud functions. From there, one can choose the preferred language to write in and utilize the libraries / packages needed for compressing (for example, NodeJS has a Zlib module out of the box).

References

  1. gzip method reference
  2. Cloud functions reference
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top