Question

Following on from Combining array_agg and unnest, given this data:

key |  a | b | c
------------------
 1  |  0 | 1 | {1,2}
 1  |  1 | 2 | {3}
 1  | -1 | 3 | {2}
 1  |  2 | 4 | {}

Executing this query:

SELECT 
  d.key,
  min(d.a) AS a,
  sum(d.b) AS b,
  array_agg(DISTINCT x.c) AS c 
FROM data AS d
  CROSS JOIN LATERAL unnest(d.c) AS x(c)
GROUP BY d.key

Gives the unexpected result:

key |  a | b | c
------------------
 1  | -1 | 7 | {1,2,3}

What's going on here, and how to get the correct sum?


Answer performance

On my actual data (16642 rows, 1942 keys, 6 aggregates), I get these cost estimates for each suggested solution.

Was it helpful?

Solution

The unnesting generates 2 rows for (key = 1, a = 0, b = 1), and the cross join removes the row with the empty array.

So your group by operates on the following set:

key | a  | b | c
----+----+---+--
  1 |  0 | 1 | 1
  1 |  0 | 1 | 2
  1 |  1 | 2 | 3
  1 | -1 | 3 | 2

One solution is to combine two group by queries, each one grouping n a different level:

select *
from (
  select d1."key", min(d1.a), sum(d1.b)
  from data d1
  group by d1."key"
) m
  join (
    select "key", array_agg(DISTINCT x.c) AS c 
    from data d2
      left join lateral unnest(d2.c) as x(c) on true
    where x.c is not null
    group by "key"
  ) a on a."key" = m."key";

Another approach is to only include the "first row" for each "unnested" group in the aggregates:

select d."key", 
       min(d.a) filter (where idx = 1 or idx is null), 
       sum(d.b) filter (where idx = 1 or idx is null),
       array_agg(distinct x.c)
from data AS d
  left join lateral unnest(d.c) with ordinality  AS x(c,idx) on true
group by d."key";

with ordinality returns the position of the unnested element in the original array. For the row with the empty array, that will be null.

OTHER TIPS

One problem with your query is that the CROSS JOIN eliminates rows where unnest() produces no rows (happens for the empty array {}).

You could fix that with LEFT JOIN .. ON true, but the other problem is that rows are multiplied where unnest() returns multiple rows (happens for {1,2}).

Thats how you get 7 for the sum: 1 + 1 + 2 + 3.

Joining two separate subqueries might be simplest / fastest:

SELECT *
FROM (
   SELECT key, min(a) AS min_a, sum(b) AS sum_b
   FROM   data
   GROUP  BY 1
   ) x
JOIN  (
   SELECT key, array_agg(DISTINCT e) AS dist_c 
   FROM   data, unnest(c) e
   GROUP  BY 1
   ) y USING (key);

The plain join works because both subqueries have grouped by the same column - and as long as key is NOT NULL.

db<>fiddle here

Related:

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top