Question

From the import documentation of BigQuery,

Note: Null values are not allowed

So I assume null is not allowed in a json-formatted data for BigQuery import. However, null value is actually very common in regular ETL task (due to missing data). What should be a good solution to import such json source files? Note my data contains nested structures so I do not prefer a conversion to CSV and use ,, to represent a null value.

One way I think I can do is to replace all null values with default values of different data types respectively, e.g.,

  • string: null -> empty string
  • integer: null -> -1
  • float: null -> -1.0
  • ...

But I don't like it. I am looking for better options.

BTW, I tried to do bq load with a json file containing null values. I get the below error:

Failure details:
- Expected '"' found 'n'
- Expected '"' found 'n'
- Expected '"' found 'n'
- Expected '"' found 'n'
- Expected '"' found 'n
...

I think this is the indication of null usage, is it correct?

EDIT: If I remove all the null fields, it seems to work. I guess this is the way to handle the null data. You cannot have null for a data field, but you can just not include it. So I need to have a filtering code to remove all the null field in my raw json.

Was it helpful?

Solution

You can import NULL values using JSON format source files - omit the key:value pair for values that are NULL.

Example - Let's say you have a schema like this:

{
"name": "kind",
"type": "string"
},
{
"name": "fullName",
"type": "string",
},
{
"name": "age",
"type": "integer",
"mode": "nullable"
}

A record with no NULL values might look like this:

{"kind": "person",
 "fullName": "Some Person",
 "age": 22
}

However, when "age" is NULL, try this (note, no "age" key):

{"kind": "person",
 "fullName": "Some Person",
}

Please let us know if you have issues with this. I'll make a note to improve the documentation around using NULL values with JSON import formats.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top