AVG is not taking null values into consideration
-
12-12-2019 - |
Question
I have loaded the following test data:
name, age,gender
"John", 33,m
"Sam", 33,m
"Julie",33,f
"Jimbo",, m
with schema: name:STRING,age:INTEGER,gender:STRING
and I have confirmed that the Jimbo row shows a null for column "age" in the BigQuery Browser Tool > mydataset > Details > Preview section.
When I run this query :
SELECT AVG(age) FROM [peterprivatedata.testpeople]
I get 24.75 which is incorrect. I expected 33 because the documentation for AVG says "Rows with a NULL value are not included in the calculation."
Am I doing something wrong or is this a known bug? (I don't know if there's a public issues list to check). What's the simplest workaround to this?
Solution
This is a known bug where we coerce null numeric values to to 0 on import. We're currently working on a fix. These values do however, show up as not not defined (which for various reasons is different from null), so you can check for IS_EXPLICITLY_DEFINED. For example:
SELECT sum(if(is_explicitly_defined(numeric_field), numeric_field, 0)) /
sum(if(is_explicitly_defined(numeric_field), 1, 0))
AS my_avg FROM your_table
Alternately, you could use another column to represent is_null. Then the query would look like:
SELECT sum(if(numeric_field_is_null, 0, numeric_field)) /
sum(if(numeric_field_is_null, 0, 1))
AS my_avg FROM your_table