Question

I have an example table (year) as below in PostgreSQL 9.5:

Name | 2010 | 2011 | 2012
-------------------------
A    |   10 |      |   40
B    |   10 |   20 |   30 

Now, if I write a simple query as shown below to take average for columns (2010, 2011, 2012) I will get the correct result for B but result for A will be NULL because of the NULL in the 2011 column:

select (2010+2011+2012)/3 as avg from year

Is there any way to write a query so that I can take average of only non-NULLs in a row?

Was it helpful?

Solution

The only correct and scalable solution to that problem is to normalize your model.

However, you can normalize the data "on the fly" and then use standard SQL aggregation on the result.

select name, avg(t.val)
from the_table
  cross join unnest (array["2010","2011","2012"]) with ordinality as t(val)
group by name;

Online example: https://rextester.com/TGTC30399

But I would strongly recommend to fix your data model by properly normalizing it.

OTHER TIPS

In most cases a normalized relational design would be the proper solution, as has been commented.

While you are stuck with your design - and there are cases where it makes sense (like to minimize storage for big tables) - listing all column names may be tedious and error-prone. Here is an alternative:

SELECT name, avg(value::int)  -- cast to the type actually used
FROM   tbl t, jsonb_each_text(to_jsonb(t) - 'name')  -- exclude non-value columns
GROUP  BY 1;

Instead of listing all columns to include, remove the one (or few) column(s) to exclude. Also keeps working if you add more value columns later ("2013", "2014", ...)

Drawback: this casts the values to JSON and back, which adds a tiny cost.

Related:

Aside, don't use numbers as column names, which requires double-quoting. Use something like c2010, c2011, ...

Postgres has the hardly known function num_nonnulls() for your case exactly. The manual:

returns the number of non-null arguments

SELECT (COALESCE("2010", 0)
      + COALESCE("2011", 0)
      + COALESCE("2012", 0))
      / num_nonnulls("2010","2011","2012") AS avg
FROM tbl t;

db<>fiddle here

Related:

Make sure to double-quote your illegal names and use COALESCE for each term or the sum is NULL if any of them is NULL.

For operations on many columns see my older answer. (But this is faster.)

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top