SQL 2008 Help: Compare current row with previous based on reference ID, datetime and interval between datetime values

StackOverflow https://stackoverflow.com/questions/20906410

Question

I'm trying to to assign an ID to events based on a number of logical operators.

My data consists of a reference field (VC, in the style of region.town.property for example), DateTime (DT, dd/mm/yyyy hh:mm:ss) and value (as decimal) which is recorded in 1 second intervals. The raw data table has around 300,000,000 records in it.

For an example of what the data represents, consider the value field to represent the flow of water into a standard cistern toilet. Generally (at 1 second recorded intervals) the flow will be zero. At some point, someone will flush the toilet which will empty the contents of the cistern to flush the toilet, before it then refills itself. The time taken to refill the cistern will depend on the water pressure and the capacity of the cistern. When I refer to an "event", I'm referring to the time between the cistern starting to refill itself (i.e. the first non-zero value in a series) until the cistern is full (i.e. the last non-zero value in a series). I'm trying to assign an ID to each of these "events".

I set up a test case using an excel spreadsheet so I could check my logic assumptions, but now I'm struggling to translate those into a SQL (using SQL Server 2008 R2).

My first step was to only select the records with non-zero values (using <>0).

My excel formula is below, which is what I'm trying to base my SQL query on, together with sample data and the desired "ID" field. =IF(AND(B3=B2,DAY(C3)=DAY(C2),MONTH(C3)=MONTH(C2),YEAR(C3)=YEAR(C2),TIME(HOUR(C3),MINUTE(C3),SECOND(C3))=TIME(HOUR(C2),MINUTE(C2),SECOND(C2)+1))=TRUE,A2,A2+1)

The logic here is to check the current row and the preceding row to determine if the values belong to the same site (Reference_VC, cell references B3 and B2), and that the time difference between the two DateTime_DT fields (cells C3 and C2) is exactly 1 second. If the criteria are met, then the ID is taken from the preceding row. If the criteria fail, then a new ID series begins by adding 1 to the preceding ID.

    ID  Reference_VC    DateTime_DT Value_DEC
    1   a.b.c   29/07/2000 00:43:30 0.2236
    1   a.b.c   29/07/2000 00:43:31 0.2045
    2   a.b.c   29/07/2000 00:43:35 0.2674
    2   a.b.c   29/07/2000 00:43:36 0.2806
    3   a.b.c   29/07/2000 00:43:40 0.3716
    4   d.e.f   29/07/2000 00:42:35 0.2001
    4   d.e.f   29/07/2000 00:42:36 0.2231
    4   d.e.f   29/07/2000 00:42:37 0.2604
    4   d.e.f   29/07/2000 00:42:38 0.3729
    4   d.e.f   29/07/2000 00:42:39 0.2358
    5   d.e.f   29/07/2000 00:42:45 0.2599
    5   d.e.f   29/07/2000 00:42:46 0.2099
    6   g.h.i   29/07/2000 01:13:42 0.3129
    7   g.h.i   29/07/2000 01:13:42 0.2313
    8   g.h.i   29/07/2000 01:13:42 0.2966
    9   g.h.i   29/07/2000 01:13:42 0.3611
    10  g.h.i   29/07/2000 01:13:42 0.2293
    11  g.h.i   29/07/2000 01:13:42 0.3889

Any help would be much appreciated!

Thanks,

Mike

Was it helpful?

Solution

This query produces the result that you say that you're looking for. I'm using @t in place of your original table1:

;With Ordered as (
    select ROW_NUMBER() OVER (PARTITION BY Reference_VC
                              ORDER BY DateTime_DT) as rn,*
    from @t
), Islands as (
    select o.Reference_VC, o.rn as RnStart, o.rn as RnEnd,
           o.DateTime_DT as dtStart, o.DateTime_DT as dtEnd
    from Ordered o
        left join
        Ordered o_not
        on o.Reference_VC = o_not.Reference_VC and
           o.rn = o_not.rn + 1 and
           o.DateTime_DT = DATEADD(second,1,o_not.DateTime_DT)
    where
        o_not.rn is null
    union all
    select
        i.Reference_VC,i.RnStart,o.rn,i.dtStart,o.DateTime_DT
    from Islands i
        inner join
        Ordered o
            on
                i.Reference_VC = o.Reference_VC and
                i.RnEnd = o.rn - 1 and
                i.dtEnd = DATEADD(second,-1,o.DateTime_DT)
), FinalIslands as (
    select Reference_VC,RnStart,dtStart,
           MAX(rnEnd) as rnEnd,MAX(dtEnd) as dtEnd,
           ROW_NUMBER() OVER (Order BY Reference_VC,RnStart) as ID
    from Islands i
    group by Reference_VC,RnStart,dtStart
)
select
    fi.ID,t.*
from
    FinalIslands fi
        inner join
    @t t
        on
            fi.Reference_VC = t.Reference_VC and
            fi.dtStart <= t.DateTime_DT and
            fi.dtEnd >= t.DateTime_DT

The tricky part is the middle Common Table Expression (CTE), Islands, which is recursive. The anchor portion (above UNION ALL) finds rows which don't have a row immediately preceding them. The recursive part then tries to extend each one by finding a row which can be added onto the end.

This ends up producing rows which represent the full range of periods separated by 1 second - but it also includes all of the intermediate solutions generated by the recursive part. So the FinalIslands CTE is used to remove the intermediate results and just take the full periods. I also took the opportunity, at this stage, of generating the ID values.

We then take these periods and join back to the original table to produce the final output.


1 This is the sample data, in a table variable. You'd need to have this first and then the above query, in a single query window and without any GO between them, to try it out:

declare @t table (Reference_VC varchar(5) not null,
                  DateTime_DT datetime not null,
                  Value_DEC decimal(10,9) not null)
insert into @t(Reference_VC,DateTime_DT,Value_DEC) values
('a.b.c','2000-07-29T00:43:30',0.2236),
('a.b.c','2000-07-29T00:43:31',0.2045),
('a.b.c','2000-07-29T00:43:35',0.2674),
('a.b.c','2000-07-29T00:43:36',0.2806),
('a.b.c','2000-07-29T00:43:40',0.3716),
('d.e.f','2000-07-29T00:42:35',0.2001),
('d.e.f','2000-07-29T00:42:36',0.2231),
('d.e.f','2000-07-29T00:42:37',0.2604),
('d.e.f','2000-07-29T00:42:38',0.3729),
('d.e.f','2000-07-29T00:42:39',0.2358),
('d.e.f','2000-07-29T00:42:45',0.2599),
('d.e.f','2000-07-29T00:42:46',0.2099),
('g.h.i','2000-07-29T01:13:42',0.3129),
('g.h.i','2000-07-29T01:13:42',0.2313),
('g.h.i','2000-07-29T01:13:42',0.2966),
('g.h.i','2000-07-29T01:13:42',0.3611),
('g.h.i','2000-07-29T01:13:42',0.2293),
('g.h.i','2000-07-29T01:13:42',0.3889)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top