Question

I am trying to find gaps in my data and then group those. The table I am working with looks like this:

runid|year |road|start   |end     
01010|  9  |2   |0.000   |0.585    
01100|  9  |2   |0.585   |4.980    
01100|  9  |2   |4.980   |7.777  
01100|  9  |2   |7.777   |11.857   
01100|  9  |2   |11.857  |13.274   
01100|  9  |2   |15.235  |21.021  
01100|  9  |2   |21.021  |25.333  
01100|  9  |3   |0.000   |7.777  
01100|  9  |3   |7.777   |13.274  
01100|  9  |3   |13.274  |25.333 
...

I want to be able to create a new column that identifies the group like this:

runid|year |road|start   |end    |rn    
01010|  9  |2   |0.000   |0.585  |1  
01100|  9  |2   |0.585   |4.980  |1  
01100|  9  |2   |4.980   |7.777  |1
01100|  9  |2   |7.777   |11.857 |1  
01100|  9  |2   |11.857  |13.274 |1  
01100|  9  |2   |15.235  |21.021 |2
01100|  9  |2   |21.021  |25.333 |2 
01100|  9  |3   |0.000   |7.777  |1
01100|  9  |3   |7.777   |13.274 |1 
01100|  9  |3   |13.274  |25.333 |1  
...  

As you can see the start-end is in syn for some portion of the data and then there is a gap between 13.274 to 15.235, that is where you switch the new column(rn)
Note: The table is a snapshot of the big table with numerous runid,years, roads and relevant start and end points

I have done something like this so far:

with cte as (  
select distinct runid,year,road,start,end, LAG(end) over (partition by runid,year,road order by start) rn from dbo.runners
)
select *,CASE WHEN rn <> start then 1 when rn is null then 2 else 0 end chk from cte order by runid,year,road,start   

This gives me this:

runid|year |road|start   |end    |rn    |chk   
01010|  9  |2   |0.000   |0.585  |NULL  |2  
01100|  9  |2   |0.585   |4.980  |0.585 |0  
01100|  9  |2   |4.980   |7.777  |4.980 |0  
01100|  9  |2   |7.777   |11.857 |7.777 |0  
01100|  9  |2   |11.857  |13.274 |11.857|0    
01100|  9  |2   |15.235  |21.021 |13.274|1  
01100|  9  |2   |21.021  |25.333 |21.021|0   
01100|  9  |3   |0.000   |7.777  |NULL  |2  
01100|  9  |3   |7.777   |13.274 |7.777 |0  
01100|  9  |3   |13.274  |25.333 |13.274|0    
... 

I am not sure how to get more like a RANK column for each group within my partitions.

Testing link here

Was it helpful?

Solution

One possible approach is to define when there are breaks in time values using LAG() and then define the groups using SUM():

;WITH ChangedCTE AS (
   SELECT
      *,
      CASE 
         WHEN [start] = LAG([end]) OVER (PARTITION BY [runid], [year], [road] ORDER BY [start]) THEN 0
         ELSE 1
      END AS Changed
   FROM [dbo].[runners]
)
SELECT
   [runid], [year], [start], [end], [road],
   SUM([Changed]) OVER (PARTITION BY [runid], [year], [road] ORDER BY [start]) AS Groups
FROM ChangedCTE

Output:

runid   year    start   end road    Groups
01010000    9   0.000   0.585   2   1
01010000    9   0.585   4.980   2   1
01010000    9   4.980   7.777   2   1
01010000    9   7.777   11.857  2   1
01010000    9   11.857  13.274  2   1
01010000    9   15.235  21.021  2   2
01010000    9   21.021  22.142  2   2
01010000    9   22.142  25.946  2   2
01010000    9   0.000   7.777   3   1
01010000    9   7.777   11.857  3   1
01010000    9   11.857  13.274  3   1
01010000    9   15.235  21.021  3   2
01010000    9   21.021  22.142  3   2
01010000    9   22.142  25.946  3   2
Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top