PostgreSQL - GROUP subsequent rows

https://stackoverflow.com/questions/20446242

30-08-2022
|

Question

I have a table which contains some records ordered by date.

And I want to get start and end dates for each subsequent group (grouped by some criteria e.g.position).

create table tbl (id int, date timestamp without time zone, 
                  position int);

insert into tbl values 
( 1 , '2013-12-01', 1),
( 2 , '2013-12-02', 2),
( 3 , '2013-12-03', 2),
( 4 , '2013-12-04', 2),
( 5 , '2013-12-05', 3),
( 6 , '2013-12-06', 3),
( 7 , '2013-12-07', 2),
( 8 , '2013-12-08', 2)

Of course if I simply group by position I will get wrong result as positions could be the same for different groups:

SELECT POSITION, min(date) MIN, max(date) MAX
FROM tbl GROUP BY POSITION

I will get:

POSITION    MIN                             MAX
1           December, 01 2013 00:00:00+0000 December, 01 2013 00:00:00+0000
3           December, 05 2013 00:00:00+0000 December, 06 2013 00:00:00+0000
2           December, 02 2013 00:00:00+0000 December, 08 2013 00:00:00+0000

But I want:

POSITION    MIN                             MAX
1           December, 01 2013 00:00:00+0000 December, 01 2013 00:00:00+0000
2           December, 02 2013 00:00:00+0000 December, 04 2013 00:00:00+0000
3           December, 05 2013 00:00:00+0000 December, 06 2013 00:00:00+0000
2           December, 07 2013 00:00:00+0000 December, 08 2013 00:00:00+0000

I found a solution for MySql which uses variables and I could port it but I believe PostgreSQL can do it in some smarter way using its advanced features like window functions.

I'm using PostgreSQL 9.2

Solution

There is probably more elegant solution but try this:

WITH tmp_tbl AS (
SELECT *,
CASE WHEN lag(position,1) OVER(ORDER BY id)=position 
    THEN position 
    ELSE ROW_NUMBER() OVER(ORDER BY id)
    END AS grouping_col  
FROM tbl
)
, tmp_tbl2 AS(
SELECT position,date,
CASE WHEN lag(position,1)OVER(ORDER BY id)=position 
    THEN lag(grouping_col,1) OVER(ORDER BY id)
    ELSE ROW_NUMBER() OVER(ORDER BY id) 
    END AS grouping_col
FROM tmp_tbl
)
SELECT POSITION, min(date) MIN, max(date) MAX
FROM tmp_tbl2 GROUP BY grouping_col,position

OTHER TIPS

There are some complete answers on Stackoverflow for that, so I'll not repeat them in detail, but the principle of it is to group the records according to the difference between:

The row number when ordered by the date (via a window function)
The difference between the dates and a static date of reference.

So you have a series such as:

rownum datediff diff
1      1        0 ^
2      2        0 | first group
3      3        0 v
4      5        1 ^
5      6        1 | second group
6      7        1 v
7      9        2 ^
8      10       2 v third group

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow