Select into specific array positions with array_agg()?
-
17-12-2020 - |
Question
Is there a way to set values in specific positions inside an array, based on information from other columns? (Postgres 9.3 or later.)
For example, I would like to select an item and its stock information from the following tables:
Table item
:
CREATE TABLE item (
id integer NOT NULL
);
INSERT INTO item VALUES
(1), (2), (3), (4);
Table item_stock
(containing shop-specific information like stock and prices):
CREATE TABLE item_stock (
item_id integer NOT NULL,
shop_id integer NOT NULL,
stock integer,
cost numeric(19,3),
);
INSERT INTO items_stock VALUES
(1, 1, 2, 10),
(1, 2, 0, 9),
(2, 2, 0, 9),
(3, 1, 3, 22);
Looking for a query to produce the following results, where the array in the column stock
contains stock info for specific shops. In the example, array position 1 is stock
for shop_id=1
and array position 2 is stock
for shop_id=2
. 0
instead of NULL
where no data is found:
id | stock
---+-------
1 | {2, 0}
2 | {0, 0}
3 | {3, 0}
4 | {0, 0}
Solution
Your answer basically gets the job done:
SELECT b.id, array_agg(b.stock) AS stock
FROM (
SELECT i.id, COALESCE(i_s.stock, 0) AS stock
FROM item i
CROSS JOIN unnest('{1,2}'::int[]) n
LEFT JOIN item_stock i_s ON i.id = i_s.item_id AND n.n = i_s.shop_id
ORDER BY i.id, n.n
) b
GROUP BY b.id;
Two notable changes:
Order is not guaranteed without
ORDER BY
in the subquery or as additional clause toarray_aggregate()
(typically more expensive). And that's the core element of your question.unnest('{1,2}'::int[])
instead ofgenerate_series(1,2)
as requested shop IDs will hardly be sequential all the time.
I also moved the set-returning function from the SELECT
list to a separate table expression attached with CROSS JOIN
. Standard SQL form, but that's just a matter of clarity and taste, not a necessity. At least in Postgres 10 or later. See:
Doing the same with LEFT JOIN LATERAL
and an ARRAY constructor might be a bit faster as we don't need the outer GROUP BY
and the ARRAY constructor is typically faster, too:
SELECT i.id, s.stock
FROM item i
CROSS JOIN LATERAL (
SELECT ARRAY(
SELECT COALESCE(i_s.stock, 0)
FROM unnest('{1,2}'::int[]) n
LEFT JOIN item_stock i_s ON i_s.shop_id = n.n
AND i_s.item_id = i.id
ORDER BY n.n
) AS stock
) s;
Related:
- Why is array_agg() slower than the non-aggregate ARRAY() constructor?
- How to apply ORDER BY and LIMIT in combination with an aggregate function?
And if you have more than just the two shops, a nested crosstab()
should provide top performance:
SELECT i.id, COALESCE(stock, '{0,0}') AS stock
FROM item i
LEFT JOIN (
SELECT id, ARRAY[COALESCE(shop1, 0), COALESCE(shop2, 0)] AS stock
FROM crosstab(
$$SELECT item_id, shop_id, stock
FROM item_stock
WHERE shop_id = ANY ('{1,2}'::int[])
ORDER BY 1,2$$
, $$SELECT unnest('{1,2}'::int[])$$
) AS ct (id int, shop1 int, shop2 int)
) i_s USING (id);
Needs to be adapted in more places to cater for different shop IDs.
Related:
db<>fiddle here
Index
Make sure you have at least an index on item_stock (shop_id, item_id)
- typically provided by a PRIMARY KEY
on those columns. For the crosstab query, it also matters that shop_id
comes first. See:
Adding stock
as another index expression may allow faster index-only scans. In Postgres 11 or later consider an INCLUDE
item to the PK like so:
PRIMARY KEY (shop_id, item_id) INCLUDE (stock)
But only if you need it a lot, as it makes the index a bit bigger and possibly more susceptible to bloat from updates.
OTHER TIPS
This is the query I was able to come up (with some brute-force):
SELECT b.id, array_agg(b.stock) FROM (
SELECT a.*, COALESCE(i_s.stock, 0) as stock FROM (
SELECT id, generate_series(1, 2) as n FROM items
) as a
LEFT OUTER JOIN item_stock i_s ON a.id = i_s.item_id AND a.n = i_s.shop_id
) as b GROUP by b.id;