Question
I have a select query that will return something like the following table:
start | stop | id ------------------ 0 | 100 | 1 1 | 101 | 1 2 | 102 | 1 2 | 102 | 2 5 | 105 | 1 7 | 107 | 2 ... 300 | 400 | 1 370 | 470 | 1 450 | 550 | 1
Where stop = start + n; n = 100 in this case.
I would like to merge the overlaps for each id:
start | stop | id ------------------ 0 | 105 | 1 2 | 107 | 2 ... 300 | 550 | 1
id 1 does not give 0 - 550 because the start 300 is after stop 105.
There will be hundreds of thousands of records returned by the first query and n can go up to tens of thousands, so the faster it can be processed the better.
Using PostgreSQL btw.
Solution
WITH bounds AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY start) AS rn
FROM (
SELECT id, LAG(stop) OVER (PARTITION BY id ORDER BY start) AS pstop, start
FROM q
UNION ALL
SELECT id, MAX(stop), NULL
FROM q
GROUP BY
id
) q2
WHERE start > pstop OR pstop IS NULL OR start IS NULL
)
SELECT b2.start, b1.pstop
FROM bounds b1
JOIN bounds b2
ON b1.id = b2.id
AND b1.rn = b2.rn + 1
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow