Question

CREATE TABLE inventory_box (
 box_id varchar(10),
 value   integer
);

INSERT INTO inventory_box VALUES ('1', 10), ('2', 15), ('3', 20);

I prepared a sql fiddle with the schema.

I would like to select a list of inventory boxes with combined value of above 20

possible result 1. box 1 + box 2 (10 + 15 >= 20) 

Here is what I am doing right now:

SELECT * FROM inventory_box LIMIT 1 OFFSET 0;
-- count on the client side and see if I got enough
-- got 10
SELECT * FROM inventory_box LIMIT 1 OFFSET 1;
-- count on the client side and see if I got enough
-- got 15, add it to the first query which returned 10
-- total is 25, ok, got enough, return answer

I am looking for a solution where the scan will stop as soon as it reaches the target value

Was it helpful?

Solution 2

Your question update clarifies that your actual requirements are much simpler than a full-blown "subset sum problem" as suspected by @GhostGambler:
Just fetch rows until the sum is big enough.

I am sorting by box_id to get deterministic results. You might even drop the ORDER BY altogether to get any valid result a bit faster, yet.

Slow: Recursive CTE

WITH RECURSIVE i AS (
   SELECT *, row_number() OVER (ORDER BY box_id) AS rn
   FROM   inventory_box
   )
, r AS (
   SELECT box_id, val, val AS total, 2 AS rn
   FROM   i
   WHERE  rn = 1

   UNION ALL
   SELECT i.box_id, i.val, r.total + i.val, r.rn + 1
   FROM   r
   JOIN   i USING (rn)
   WHERE  r.total < 20
   )
SELECT box_id, val, total
FROM   r
ORDER  BY box_id;

Fast: PL/pgSQL function with FOR loop

Using sum() as window aggregate function (cheapest this way).

CREATE OR REPLACE FUNCTION f_shop_for(_total int)
  RETURNS TABLE (box_id text, val int, total int) AS
$func$
BEGIN

total := 0;

FOR box_id, val, total IN
    SELECT i.box_id, i.val
          , sum(i.val) OVER (ORDER BY i.box_id) AS total
    FROM inventory_box i
LOOP
    RETURN NEXT;
    EXIT WHEN total >= _total;
END LOOP; 

END
$func$ LANGUAGE plpgsql STABLE;

SELECT * FROM f_shop_for(35);

I tested both with a big table of 1 million rows. The function only reads the necessary rows from index and table. The CTE is very slow, seems to scan the whole table ...

SQL Fiddle for both.

Aside: sorting by a varchar column (box_id) containing numeric data yields dubious results. Maybe this should be a numeric type, really?

OTHER TIPS

One possible approach scans the table in box_id order until the total is above 30, then returns all the previous rows plus the row that tipped the sum over the limit. Note that the scan doesn't stop when the sum is reached, it totals the whole table then goes back over the results to pick the results.

http://sqlfiddle.com/#!15/1c502/4

SELECT
  array_agg(box_id ORDER BY box_id) AS box_ids,
  max(boxsum) AS boxsum
FROM
(
  SELECT
    box_id,
    sum(value) OVER (ORDER BY box_id) AS boxsum,
    sum(value) OVER (ORDER BY box_id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS prevboxsum
  FROM 
    inventory_box
) x
WHERE prevboxsum < 30 OR prevboxsum IS NULL;

but really, this is going to be pretty gruesome to do in a general and reliable manner in SQL (or at all).

You can ORDER BY value ASC instead of ORDER BY box_id if you like; this will add boxes from the smallest to the biggest. However, this will catastrophically fail if you then remove all the small boxes from the pool and run it again, and repeat. Soon it'll just be lumping two big boxes together inefficiently.

To solve this for the general case, finding the smallest combination, is a hard optimization problem that probably benefits from imprecise sample- and probabilistic based methods.

To scan the table in order until the sum reaches the target, lock the table then use PL/PgSQL to read rows from a cursor that returns the rows in value order plus an array_agg(box_id) OVER (ORDER BY value) and sum(value) OVER (order by value). When you reach the desired sum, return the current row's array. This won't produce an optimal solution, but it'll produce a solution, and I think it'll do so without a full table scan if there's a suitable index in place.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top