Overlaps and Gap grouping in SQL Server
-
30-01-2021 - |
Question
I am trying to find gaps in my data and then group those. The table I am working with looks like this:
runid|year |road|start |end
01010| 9 |2 |0.000 |0.585
01100| 9 |2 |0.585 |4.980
01100| 9 |2 |4.980 |7.777
01100| 9 |2 |7.777 |11.857
01100| 9 |2 |11.857 |13.274
01100| 9 |2 |15.235 |21.021
01100| 9 |2 |21.021 |25.333
01100| 9 |3 |0.000 |7.777
01100| 9 |3 |7.777 |13.274
01100| 9 |3 |13.274 |25.333
...
I want to be able to create a new column that identifies the group like this:
runid|year |road|start |end |rn
01010| 9 |2 |0.000 |0.585 |1
01100| 9 |2 |0.585 |4.980 |1
01100| 9 |2 |4.980 |7.777 |1
01100| 9 |2 |7.777 |11.857 |1
01100| 9 |2 |11.857 |13.274 |1
01100| 9 |2 |15.235 |21.021 |2
01100| 9 |2 |21.021 |25.333 |2
01100| 9 |3 |0.000 |7.777 |1
01100| 9 |3 |7.777 |13.274 |1
01100| 9 |3 |13.274 |25.333 |1
...
As you can see the start-end is in syn for some portion of the data and then there is a gap between 13.274 to 15.235, that is where you switch the new column(rn)
Note: The table is a snapshot of the big table with numerous runid,years, roads and relevant start and end points
I have done something like this so far:
with cte as (
select distinct runid,year,road,start,end, LAG(end) over (partition by runid,year,road order by start) rn from dbo.runners
)
select *,CASE WHEN rn <> start then 1 when rn is null then 2 else 0 end chk from cte order by runid,year,road,start
This gives me this:
runid|year |road|start |end |rn |chk
01010| 9 |2 |0.000 |0.585 |NULL |2
01100| 9 |2 |0.585 |4.980 |0.585 |0
01100| 9 |2 |4.980 |7.777 |4.980 |0
01100| 9 |2 |7.777 |11.857 |7.777 |0
01100| 9 |2 |11.857 |13.274 |11.857|0
01100| 9 |2 |15.235 |21.021 |13.274|1
01100| 9 |2 |21.021 |25.333 |21.021|0
01100| 9 |3 |0.000 |7.777 |NULL |2
01100| 9 |3 |7.777 |13.274 |7.777 |0
01100| 9 |3 |13.274 |25.333 |13.274|0
...
I am not sure how to get more like a RANK column for each group within my partitions.
Testing link here
Solution
One possible approach is to define when there are breaks in time values using LAG()
and then define the groups using SUM()
:
;WITH ChangedCTE AS (
SELECT
*,
CASE
WHEN [start] = LAG([end]) OVER (PARTITION BY [runid], [year], [road] ORDER BY [start]) THEN 0
ELSE 1
END AS Changed
FROM [dbo].[runners]
)
SELECT
[runid], [year], [start], [end], [road],
SUM([Changed]) OVER (PARTITION BY [runid], [year], [road] ORDER BY [start]) AS Groups
FROM ChangedCTE
Output:
runid year start end road Groups
01010000 9 0.000 0.585 2 1
01010000 9 0.585 4.980 2 1
01010000 9 4.980 7.777 2 1
01010000 9 7.777 11.857 2 1
01010000 9 11.857 13.274 2 1
01010000 9 15.235 21.021 2 2
01010000 9 21.021 22.142 2 2
01010000 9 22.142 25.946 2 2
01010000 9 0.000 7.777 3 1
01010000 9 7.777 11.857 3 1
01010000 9 11.857 13.274 3 1
01010000 9 15.235 21.021 3 2
01010000 9 21.021 22.142 3 2
01010000 9 22.142 25.946 3 2