Question

Is there a way to load a JSON file from local file system to BigQuery using Google BigQuery Client API?

All the options I found are:

1- Streaming the records one by one.

2- Loading JSON data from GCS.

3- Using raw POST requests to load the JSON (i.e. not through Google Client API).

Was it helpful?

Solution

I'm assuming from the python tag that you want to do this from python. There is a load example here that loads data from a local file (it uses CSV, but it is easy to adapt it to JSON... there is another json example in the same directory).

The basic flow is:

# Load configuration with the destination specified.
load_config = {
  'destinationTable': {
    'projectId': PROJECT_ID,
    'datasetId': DATASET_ID,
    'tableId': TABLE_ID
  }
}

load_config['schema'] = {
  'fields': [
    {'name':'string_f', 'type':'STRING'},
    {'name':'boolean_f', 'type':'BOOLEAN'},
    {'name':'integer_f', 'type':'INTEGER'},
    {'name':'float_f', 'type':'FLOAT'},
    {'name':'timestamp_f', 'type':'TIMESTAMP'}
  ]
}
load_config['sourceFormat'] = 'NEWLINE_DELIMITED_JSON'

# This tells it to perform a resumable upload of a local file
# called 'foo.json' 
upload = MediaFileUpload('foo.json',
                         mimetype='application/octet-stream',
                         # This enables resumable uploads.
                         resumable=True)

start = time.time()
job_id = 'job_%d' % start
# Create the job.
result = jobs.insert(
  projectId=project_id,
  body={
    'jobReference': {
      'jobId': job_id
    },
    'configuration': {
      'load': load
    }
  },
  media_body=upload).execute()

 # Then you'd also want to wait for the result and check the status. (check out
 # the example at the link for more info).
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top