Question

I have loaded the following test data:

name,   age,gender
"John", 33,m
"Sam",  33,m
"Julie",33,f
"Jimbo",, m

with schema: name:STRING,age:INTEGER,gender:STRING and I have confirmed that the Jimbo row shows a null for column "age" in the BigQuery Browser Tool > mydataset > Details > Preview section.

When I run this query :

SELECT AVG(age) FROM [peterprivatedata.testpeople]

I get 24.75 which is incorrect. I expected 33 because the documentation for AVG says "Rows with a NULL value are not included in the calculation."

Am I doing something wrong or is this a known bug? (I don't know if there's a public issues list to check). What's the simplest workaround to this?

Was it helpful?

Solution

This is a known bug where we coerce null numeric values to to 0 on import. We're currently working on a fix. These values do however, show up as not not defined (which for various reasons is different from null), so you can check for IS_EXPLICITLY_DEFINED. For example:

SELECT sum(if(is_explicitly_defined(numeric_field), numeric_field, 0)) / 
       sum(if(is_explicitly_defined(numeric_field), 1, 0)) 
    AS my_avg FROM your_table

Alternately, you could use another column to represent is_null. Then the query would look like:

    SELECT sum(if(numeric_field_is_null, 0, numeric_field)) / 
       sum(if(numeric_field_is_null, 0, 1)) 
    AS my_avg FROM your_table
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top