Pregunta

I have a table of Orders, and each row of those have a column called price. Each of those orders also has a column called created_at that will say when that order was created.

What would be a good way to find out which order make the total amount of prices pass $1000?

So, imagine that I have three orders that look like this:

Order 1: price: $800 - created_at: 2013/07/11 

Order 2: price: $100 - created_at: 2013/07/13 

Order 3: price: $300 - created_at: 2013/07/14 

I would be interested in finding that Order 3 is the one that made me pass over $1000, because if we add $800 + $100 + $300, is exactly those $300 that made the total amount be bigger than $1000.

What query could I perform to find that?

¿Fue útil?

Solución

After calculating a running sum with the window aggregate function sum(), just pick the first row according to created_at that exceeds 1000:

SELECT *
FROM (
   SELECT order_id, created_at
        , sum(price) OVER (ORDER BY created_at) AS sum_price
   FROM   orders
   ) sub
WHERE  sum_price >= 1000
ORDER  BY created_at 
LIMIT  1;

This should be faster than @Gordon's version, because picking the first according to the same order that's already used in the window function is a lot cheaper than calculating a value for every row, which is not sargable.

I use sum_price >= 1000, so reaching 1000 exactly qualifies, too. If only exceeding should qualify use > instead of >=.

The manual on window functions informs:

In addition to these functions, any built-in or user-defined aggregate function can be used as a window function

It should be noted, that this query always delivers exactly one row, as opposed to @Gordon's query. In a case where multiple rows with identical created_at cross the 1000 barrier, all of them would qualify in Gordon's answer (or it would fail, see below), while only one is picked in mine. It will be an arbitrary one, as long you don't add more items to ORDER BY as tiebreaker. Like:

ORDER BY created_at, order_id

There are two instances of ORDER BY in this query, and it just so happens that you could modify either or both to make it work. Do it for both to make the sort order identical, this should be fastest.

Actually, Gordon's version would fail completely for this test case:

CREATE TEMP TABLE orders(order_id int, price int, created_at date);

INSERT INTO orders VALUES
  (1, 500, '2013-07-01')
 ,(2, 400, '2013-07-02')
 ,(3, 100, '2013-07-03')
 ,(4, 100, '2013-07-03')
 ,(5, 100, '2013-07-03');

You could fix it by making the sort order in the window function unique like demonstrated above.

Or you could change the frame definition for the window function to:

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

Read the fine print in the manual.

But it's slower either way.

-> SQLfiddle

Otros consejos

For this, you want a cumulative sum, which Postgres provides as a window function:

select o.*
from (select o.*,
             sum(o2.price) over (order by created_at) as cumsum
      from orders o
     ) o
where 1000 > cumsum - price and 1000 <= cumsum;

The where clause just fines the row where adding the price first exceeds $1000.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top