Preserve order of array elements after join
-
29-09-2020 - |
Question
I have a query that returns a CTE looking like
+-----------+-------------+
| node_id | ancestors |
|-----------+-------------|
| 1 | [] |
| 2 | [] |
| 3 | [1] |
| 4 | [2] |
| 5 | [4, 2] |
+-----------+-------------+
What I want to do is join with the nodes
table and replace the ids that are in the ancestors
column with another column on the nodes
table. Here's my query so far:
WITH RECURSIVE tree AS (
-- snip --
)
SELECT node.entity_id AS id,
array_remove(array_agg(parent_nodes.entity_id), NULL) AS ancestors
FROM tree
JOIN entity.nodes AS node ON node.id = tree.node_id
LEFT OUTER JOIN entity.nodes AS parent_nodes ON parent_nodes.id = ANY(tree.ancestors)
GROUP BY node.id;
The problem with that query is that is loses the order of the original ancestors
array. Is there a way to perform the join while keeping the original order during the array_agg
function?
Solution
The problem with your query is the join condition id = ANY(ancestors)
. Not only does it not preserve original order, it also eliminates duplicate elements in the array. (An id
could match 10 elements in ancestors
, it would still be picked once only.) Not sure if the logic of your query would allow duplicate elements, but if it does I am pretty sure you want to preserve all instances - you want to keep "original order" after all.
Assuming current Postgres 9.4+ for lack of information, I suggest a completely different approach:
SELECT n.entity_id, p.ancestors
FROM tree t
JOIN nodes n ON n.id = t.node_id
LEFT JOIN LATERAL (
SELECT ARRAY (
SELECT p.entity_id
FROM unnest(t.ancestors) WITH ORDINALITY a(id, ord)
JOIN entity.nodes p USING (id)
ORDER BY ord
) AS ancestors
) p ON true;
You query only works as intended if nodes.id
is defined as primary key and nodes.entity_id
is unique as well. Information is missing in the question.
Normally, this simpler query without explicit ORDER BY
works as well, but there are no guarantees (Postgres 9.3+)...
SELECT n.entity_id, p.ancestors
FROM tree t
JOIN nodes n ON n.id = t.node_id
LEFT JOIN LATERAL (
SELECT ARRAY (
SELECT p.entity_id
FROM unnest(t.ancestors) id
JOIN entity.nodes p USING (id)
) AS ancestors
) p ON true;
You can make this safe as well. Detailed explanation:
SQL Fiddle demo for Postgres 9.3.
Opional optimization
You join to entity.nodes
twice - to substitute for node_id
and ancestors
alike. An alternative would be to fold both into one array or one set and join only once. Might be faster, but you have to test.
For these alternatives we need the ORDER BY
in any case:
Add node_id
to the ancestors
array before we unnest ...
SELECT p.arr[1] AS entity_id, p.arr[2:2147483647] AS ancestors
FROM tree t
LEFT JOIN LATERAL (
SELECT ARRAY (
SELECT p.entity_id
FROM unnest(t.node_id || t.ancestors) WITH ORDINALITY a(id, ord)
JOIN entity.nodes p USING (id)
ORDER BY ord
) AS arr
) p ON true;
Or add node_id
to the unnested elements of ancestors
before we join ...
SELECT p.arr[1] AS entity_id, p.arr[2:2147483647] AS ancestors
FROM tree t
LEFT JOIN LATERAL (
SELECT ARRAY (
SELECT p.entity_id
FROM (
SELECT t.node_id AS id, 0 AS ord
UNION ALL
SELECT * FROM unnest(t.ancestors) WITH ORDINALITY
) x
JOIN entity.nodes p USING (id)
ORDER BY ord
) AS arr
) p ON true;
You did not show our CTE, this might be optimized further ...