Question

I know this is a problem others must have solved in the past, but in my limited knowledge, have yet to get over the hump. I have data which is ordered by datetime, which needs to be grouped by a combination of two fields (status and queue). In instances where the status and queue are the same within a given time frame, they should be considered part of the same group, and thus have the same id.

To accomplish this, I've attempted to implement DENSE_RANK(), and for all intents and purposes, it has been successful - with exception of the ordering of the groups. Below is an example:

WITH TEMP1 (EVENT_DATE, PRV_EVENT_DATE, STATUS, PRV_STATUS, QUEUE, PRV_QUEUE) AS
   (VALUES ('2012-09-04 11:40:19.936141', '', 'CREATED', '', 'SYSTEM', '')
          ,('2012-09-04 11:40:21.207140', '2012-09-04 11:40:19.936141', 'CREATED', 'CREATED', 'SYSTEM', 'SYSTEM')
          ,('2012-09-04 11:40:27.771140', '2012-09-04 11:40:21.207140', 'PROCESS', 'CREATED', 'PROCESS', 'SYSTEM')
          ,('2012-09-05 00:01:20.384180', '2012-09-04 11:40:27.771140', 'SUSPEND', 'PROCESS', 'SYSTEM', 'SYSTEM')
          ,('2012-09-05 00:02:14.042180', '2012-09-05 00:01:20.384180', 'SUSPEND', 'SUSPEND', 'PEND', 'SYSTEM')
          ,('2012-09-06 00:02:14.642180', '2012-09-05 00:02:14.042180', 'SUSPEND', 'SUSPEND', 'SYSTEM', 'SYSTEM')
          ,('2012-09-06 00:02:33.433180', '2012-09-06 00:02:14.642180', 'SUSPEND', 'SUSPEND', 'SYSTEM', 'SYSTEM')
   )  
SELECT 
ROW_NUMBER() OVER (ORDER BY EVENT_DATE) AS "RN",
DENSE_RANK() OVER ( ORDER BY status, queue, date(event_date)) AS "GRP",
EVENT_DATE, PRV_EVENT_DATE, STATUS, PRV_STATUS, QUEUE, PRV_QUEUE
FROM TEMP1
ORDER BY EVENT_DATE

The results are such:

RN GRP EVENT_DATE                  PRV_EVENT_DATE             STATUS  PRV_STATUS QUEUE
1   1  2012-09-04 11:40:19.936141                             CREATED            SYSTEM
2   1  2012-09-04 11:40:21.207140  2012-09-04 11:40:19.936141 CREATED CREATED    SYSTEM
3   2  2012-09-04 11:40:27.771140  2012-09-04 11:40:21.207140 PROCESS CREATED    PROCESS 
4   4  2012-09-05 00:01:20.384180  2012-09-04 11:40:27.771140 SUSPEND PROCESS    SYSTEM
5   3  2012-09-05 00:02:14.042180  2012-09-05 00:01:20.384180 SUSPEND SUSPEND    PEND
6   5  2012-09-06 00:02:14.642180  2012-09-05 00:02:14.042180 SUSPEND SUSPEND    SYSTEM

As you can tell, the "GRP" is out of order (and I also know using date(EVENT_DATE) isn't the solution).

Was it helpful?

Solution

It' not clear (at least to me), what you actually want. A new group whenever there's a change of "STATUS" or "QUEUE" compared to the previous one? Or are there more complex rules?

Looks like your data is already the result of a query where you calculate the previous value using a MIN(status/queue) OVER (ROW BETWEEN 1 PRECEDING and 1 PRECEDING)

You'll never get the right order when you cast to a DATE, try a calculation like this:

SUM(CASE WHEN status = prv_status AND queue = prv_queue THEN 0 ELSE 1 END)
OVER (ORDER BY event_date 
      ROWS UNBOUNDED PRECEDING)

Edit: Without SUM OVER you have to do use a Scalar Subquery as input to the DENSE_RANK, this should work:

SELECT 
   (SELECT MAX(event_date) 
    FROM TEMP1 AS t2 
    WHERE t2.event_date < t1.event_date
    AND t1.status <> t2.status 
    AND t1.QUEUE <> t2.queue) AS x,

    DENSE_RANK() OVER ( ORDER BY x) AS "GRP",

Of course performance might be horrible.

Maybe you better keep the "wrong" order, at least it's the same wrong value for all rows of a group :-)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top