Question

I have this data in a table

FIELD_A   FIELD_B     FIELD_D
249052903   10/15/2011 N
249052903   11/15/2011 P ------------- VALUE CHANGED
249052903   12/15/2011 P
249052903   1/15/2012   N ------------- VALUE CHANGED
249052903   2/15/2012   N
249052903   3/15/2012   N
249052903   4/15/2012   N
249052903   5/15/2012   N
249052903   6/15/2012   N
249052903   7/15/2012   N
249052903   8/15/2012   N
249052903   9/15/2012   N

When ever the value in FIELD_D changes it forms a group and I need the min and max dates in that group. The query shoud return

FIELD_A   GROUP_START   GROUP_END
249052903   10/15/2011  10/15/2011
249052903   11/15/2011  12/15/2011
249052903   1/15/2012              9/15/2012

The examples that I have seen so far have the data in Field_D being unique. Here the data can repeat as shown, First it is "N" then it changes to "P" and then back to "N".

Any help will be appreciated

Thanks

Was it helpful?

Solution

You can use analytic functions - LAG, LEAD, and COUNT() OVER to your advantage, if they are supported by your SQL implementation. SQL Fiddle here.

WITH EndsMarked AS (
  SELECT
    FIELD_A,
    FIELD_B,
    CASE WHEN FIELD_D = LAG(FIELD_D,1) OVER (ORDER BY FIELD_B)
         THEN 0 ELSE 1 END AS IS_START,
    CASE WHEN FIELD_D = LEAD(FIELD_D,1) OVER (ORDER BY FIELD_B)
         THEN 0 ELSE 1 END AS IS_END
  FROM T
), GroupsNumbered AS (
  SELECT
    FIELD_A,
    FIELD_B,
    IS_START,
    IS_END,
    COUNT(CASE WHEN IS_START = 1 THEN 1 END)
      OVER (ORDER BY FIELD_B) AS GroupNum
  FROM EndsMarked
  WHERE IS_START=1 OR IS_END=1
)
  SELECT
    FIELD_A,
    MIN(FIELD_B) AS GROUP_START,
    MAX(FIELD_B) AS GROUP_END
    FROM GroupsNumbered
    GROUP BY FIELD_A, GroupNum;

OTHER TIPS

This is fairly easy to express in SQL using subqueries:

select Field_A, Field_D, min(Field_B) as Group_Start, max(Field_B) as Group_End
from (select t.*,
             (select min(field_B)
              from t t2
              where t2.field_A = t.field_A and
                    t2.field_B > t.field_B and
                    t2.Field_D <> t.field_D
             ) as TheGroup
      from t
     ) t
group by Field_A, Field_D, TheGroup

This is assigning a group identifier using a correlated subquery. The identifier is the first value of Field_B where Field_D changes.

You don't mention the database you are using, so this uses standard SQL.

Don't use SQL for this problem because it is not possible to do it in SQL with a single table scan since it requires comparison between records. It would need a full table scan plus at least a join with itself. It is trivial to implement a solution in a imperative language and it only requires a single table scan. Edit: a stored procedure would be best.

I modified the answers a bit where you have multiple Field_A's. This should always work :-)

WITH EndsMarked 
AS 
(
    SELECT 

         [Field_A]
        ,[Field_B]
        ,CASE 
            WHEN LAG([Field_D],1) OVER (PARTITION BY [Field_A] ORDER BY [Field_A],[Field_B]) IS NULL
             AND ROW_NUMBER() OVER (PARTITION BY [Field_A] ORDER BY [Field_B]) = 1 
            THEN 1
            WHEN LAG([Field_D],1) OVER (PARTITION BY [Field_A] ORDER BY [Field_A],[Field_B]) > 0
              <> LAG([Field_D],0) OVER (PARTITION BY [Field_A] ORDER BY [Field_A],[Field_B]) > 0
            THEN 1 
            ELSE 0 
        END AS IS_START
       ,CASE 
            WHEN LEAD([Field_D],1) OVER (PARTITION BY [Field_A] ORDER BY [Field_A],[Field_B]) IS NULL
             AND ROW_NUMBER() OVER (PARTITION BY [Field_A] ORDER BY [Field_B] DESC) = 1 
            THEN 1
            WHEN LEAD([Field_D],0) OVER (PARTITION BY [Field_A] ORDER BY [Field_A],[Field_B]) 
              <> LEAD([Field_D],1) OVER (PARTITION BY [Field_A] ORDER BY [Field_A],[Field_B]) 
            THEN 1          
            ELSE 0 
        END                 AS IS_END

    FROM 
    (
        SELECT

            [Field_A]
           ,[Field_B]
           ,[Field_D]
           ,[Aantal Facturen]

        FROM [T]

    )   F
    
)
,GroupsNumbered 
AS 
(
  SELECT
     [Field_A]
    ,[Field_B]
    ,IS_START
    ,IS_END
    ,COUNT(CASE
               WHEN IS_START = 1 
               THEN 1 
           END)                     OVER (ORDER BY [Field_A]
                                                  ,[Field_B]) AS GroupNum
  FROM      EndsMarked
  WHERE     IS_START        = 1 
     OR     IS_END          = 1
)

    SELECT

        [Field_A]
        ,MIN([Field_B]) AS GROUP_START
        ,MAX([Field_B]) AS GROUP_END

    FROM GroupsNumbered

    GROUP BY [Field_A], GroupNum
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top