Preserve order of array elements after join

https://dba.stackexchange.com/questions/119698

29-09-2020
|

Question

I have a query that returns a CTE looking like

+-----------+-------------+
|   node_id | ancestors   |
|-----------+-------------|
|         1 | []          |
|         2 | []          |
|         3 | [1]         |
|         4 | [2]         |
|         5 | [4, 2]      |
+-----------+-------------+

What I want to do is join with the nodes table and replace the ids that are in the ancestors column with another column on the nodes table. Here's my query so far:

WITH RECURSIVE tree AS (
  -- snip --
)
SELECT node.entity_id AS id,
       array_remove(array_agg(parent_nodes.entity_id), NULL) AS ancestors
FROM tree
JOIN entity.nodes AS node ON node.id = tree.node_id
LEFT OUTER JOIN entity.nodes AS parent_nodes ON parent_nodes.id = ANY(tree.ancestors)
GROUP BY node.id;

The problem with that query is that is loses the order of the original ancestors array. Is there a way to perform the join while keeping the original order during the array_agg function?

Solution

The problem with your query is the join condition id = ANY(ancestors). Not only does it not preserve original order, it also eliminates duplicate elements in the array. (An id could match 10 elements in ancestors, it would still be picked once only.) Not sure if the logic of your query would allow duplicate elements, but if it does I am pretty sure you want to preserve all instances - you want to keep "original order" after all.

Assuming current Postgres 9.4+ for lack of information, I suggest a completely different approach:

SELECT n.entity_id, p.ancestors
FROM   tree t
JOIN   nodes n ON n.id = t.node_id
LEFT   JOIN LATERAL (
   SELECT ARRAY (
      SELECT p.entity_id
      FROM   unnest(t.ancestors) WITH ORDINALITY a(id, ord)
      JOIN   entity.nodes p USING (id)
      ORDER  BY ord
      ) AS ancestors
   ) p ON true;

You query only works as intended if nodes.id is defined as primary key and nodes.entity_id is unique as well. Information is missing in the question.

Normally, this simpler query without explicit ORDER BY works as well, but there are no guarantees (Postgres 9.3+)...

SELECT n.entity_id, p.ancestors
FROM   tree t
JOIN   nodes n ON n.id = t.node_id
LEFT   JOIN LATERAL (
   SELECT ARRAY (
      SELECT p.entity_id
      FROM   unnest(t.ancestors) id
      JOIN   entity.nodes p USING (id)
      ) AS ancestors
   ) p ON true;

You can make this safe as well. Detailed explanation:

PostgreSQL unnest() with element number

SQL Fiddle demo for Postgres 9.3.

Opional optimization

You join to entity.nodes twice - to substitute for node_id and ancestors alike. An alternative would be to fold both into one array or one set and join only once. Might be faster, but you have to test.
For these alternatives we need the ORDER BY in any case:

Add node_id to the ancestors array before we unnest ...

SELECT p.arr[1] AS entity_id, p.arr[2:2147483647] AS ancestors
FROM   tree t
LEFT   JOIN LATERAL (
   SELECT ARRAY (
      SELECT p.entity_id
      FROM   unnest(t.node_id || t.ancestors) WITH ORDINALITY a(id, ord)
      JOIN   entity.nodes p USING (id)
      ORDER  BY ord
      ) AS arr
   ) p ON true;

Or add node_id to the unnested elements of ancestors before we join ...

SELECT p.arr[1] AS entity_id, p.arr[2:2147483647] AS ancestors
FROM   tree t
LEFT   JOIN LATERAL (
   SELECT ARRAY (
      SELECT p.entity_id
      FROM  (
         SELECT t.node_id AS id, 0 AS ord
         UNION ALL
         SELECT * FROM unnest(t.ancestors) WITH ORDINALITY
         ) x
      JOIN   entity.nodes p USING (id)
      ORDER  BY ord
      ) AS arr
   ) p ON true;

You did not show our CTE, this might be optimized further ...

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange