Question

I have a select query that will return something like the following table:

start | stop | id
------------------
0     | 100  | 1
1     | 101  | 1
2     | 102  | 1
2     | 102  | 2
5     | 105  | 1
7     | 107  | 2
...
300   | 400  | 1
370   | 470  | 1
450   | 550  | 1

Where stop = start + n; n = 100 in this case.

I would like to merge the overlaps for each id:

start | stop | id
------------------
0     | 105  | 1
2     | 107  | 2
...
300   | 550  | 1

id 1 does not give 0 - 550 because the start 300 is after stop 105.

There will be hundreds of thousands of records returned by the first query and n can go up to tens of thousands, so the faster it can be processed the better.

Using PostgreSQL btw.

Was it helpful?

Solution

WITH    bounds AS
        (
        SELECT  *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY start) AS rn
        FROM    (
                SELECT  id, LAG(stop) OVER (PARTITION BY id ORDER BY start) AS pstop, start
                FROM    q
                UNION ALL
                SELECT  id, MAX(stop), NULL
                FROM    q
                GROUP BY
                        id
                ) q2
        WHERE   start > pstop OR pstop IS NULL OR start IS NULL
        )
SELECT  b2.start, b1.pstop
FROM    bounds b1
JOIN    bounds b2
ON      b1.id = b2.id
        AND b1.rn = b2.rn + 1
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top