Your question update clarifies that your actual requirements are much simpler than a full-blown "subset sum problem" as suspected by @GhostGambler:
Just fetch rows until the sum is big enough.
I am sorting by box_id
to get deterministic results. You might even drop the ORDER BY
altogether to get any valid result a bit faster, yet.
Slow: Recursive CTE
WITH RECURSIVE i AS (
SELECT *, row_number() OVER (ORDER BY box_id) AS rn
FROM inventory_box
)
, r AS (
SELECT box_id, val, val AS total, 2 AS rn
FROM i
WHERE rn = 1
UNION ALL
SELECT i.box_id, i.val, r.total + i.val, r.rn + 1
FROM r
JOIN i USING (rn)
WHERE r.total < 20
)
SELECT box_id, val, total
FROM r
ORDER BY box_id;
Fast: PL/pgSQL function with FOR loop
Using sum() as window aggregate function (cheapest this way).
CREATE OR REPLACE FUNCTION f_shop_for(_total int)
RETURNS TABLE (box_id text, val int, total int) AS
$func$
BEGIN
total := 0;
FOR box_id, val, total IN
SELECT i.box_id, i.val
, sum(i.val) OVER (ORDER BY i.box_id) AS total
FROM inventory_box i
LOOP
RETURN NEXT;
EXIT WHEN total >= _total;
END LOOP;
END
$func$ LANGUAGE plpgsql STABLE;
SELECT * FROM f_shop_for(35);
I tested both with a big table of 1 million rows. The function only reads the necessary rows from index and table. The CTE is very slow, seems to scan the whole table ...
Aside: sorting by a varchar
column (box_id
) containing numeric data yields dubious results. Maybe this should be a numeric type, really?