Question

I am trying to run a query in BigQuery/PHP (using google php SDK) that returns a large dataset (can be 100,000 - 10,000,000 rows).

$bigqueryService = new Google_BigqueryService($client);

$query = new Google_QueryRequest();
$query->setQuery(...);

$jobs = $bigqueryService->jobs;
$response = $jobs->query($project_id, $query); 
//query is a syncronous function that returns a full dataset

The next step is to allow the user to download the result as a CSV file.

The code above will fail when the dataset becomes too large (memory limit). What are my options to perform this operation with lower memory usage ?

(I figured an option is to save the results to another table with BigQuery and then start doing partial fetch with LIMIT and OFFSET but I figured a better solution might be available..)

Thanks for the help

Was it helpful?

Solution 3

The suggestion to export is a good one, I just wanted to mention there is another way.

The query API you are calling (jobs.query()) does not return the full dataset; it just returns a page of data, which is the first 2 MB of the results. You can set the maxResults flag (described here) to limit this to a certain number of rows.

If you get back fewer rows than are in the table, you will get a pageToken field in the response. You can then fetch the remainder with the jobs.getQueryResults() API by providing the job ID (also in the query response) and the page token. This will continue to return new rows and a new page token until you get to the end of your table.

The example here shows code (in java in python) to run a query and fetch the results page by page.

There is also an option in the API to convert directly to CSV by specifying alt='csv' in the URL query string, but I'm not sure how to do this in PHP.

OTHER TIPS

You can export your data directly from Bigquery

https://developers.google.com/bigquery/exporting-data-from-bigquery

You can use PHP to run a API call that does the export (you dont need the BQ tool)

You need to set the jobs configuration.extract.destinationFormat see the reference

Just to elaborate on Pentium10's answer

You can export up to a 1GB file in json format. Then you can read the file line by line which will minimize the memory used by your application and then you can use json_decode the information.

I am not sure do you still using the PHP but the answer is:

$options = [
    'maxResults' => 1000,
    'startIndex' => 0
];

$jobConfig = $bigQuery->query($query);
$queryResults = $bigQuery->runQuery($jobConfig, $options);

foreach ($queryResults as $row) {
    // Handle rows
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top