Question

I have an issue in Teradata where I am trying to build a historical contract table that lists a system, it's corresponding contracts and the start and end dates of each contract. This table would then be queried for reporting as a point in time table. Here is some code to better explain.

CREATE TABLE TMP_WORK_DB.SOLD_SYSTEMS 
(
SYSTEM_ID varchar(5),
CONTRACT_TYPE varchar(10),
CONTRACT_RANK int,
CONTRACT_STRT_DT date,
CONTRACT_END_DT date
);

INSERT INTO TMP_WORK_DB.SOLD_SYSTEMS  VALUES ('AAA', 'BEST', 10, '2012-01-01', '2012-06-30');
INSERT INTO TMP_WORK_DB.SOLD_SYSTEMS  VALUES ('AAA', 'BEST', 9, '2012-01-01', '2012-06-30');
INSERT INTO TMP_WORK_DB.SOLD_SYSTEMS  VALUES ('AAA', 'OK', 1, '2012-08-01', '2012-12-30');
INSERT INTO TMP_WORK_DB.SOLD_SYSTEMS  VALUES ('BBB', 'BEST', 10, '2013-12-01', '2014-03-02');
INSERT INTO TMP_WORK_DB.SOLD_SYSTEMS  VALUES ('BBB', 'BETTER', 7, '2013-12-01', '2017-03-02');
INSERT INTO TMP_WORK_DB.SOLD_SYSTEMS  VALUES ('BBB', 'GOOD', 4, '2016-12-02', '2017-12-02');
INSERT INTO TMP_WORK_DB.SOLD_SYSTEMS  VALUES ('CCC', 'BEST', 10, '2009-10-13', '2014-10-14');
INSERT INTO TMP_WORK_DB.SOLD_SYSTEMS  VALUES ('CCC', 'BETTER', 7, '2009-10-13', '2016-10-14');
INSERT INTO TMP_WORK_DB.SOLD_SYSTEMS  VALUES ('CCC', 'OK', 2, '2008-10-13', '2017-10-14');

The required output would be:

SYSTEM_ID   CONTRACT_TYPE   CONTRACT_STRT_DT    CONTARCT_END_DT     CONTRACT_RANK
  AAA          BEST            01/01/2012          06/30/2012           10
  AAA          OK              08/01/2012          12/30/2012           1
  BBB          BEST            12/01/2013          03/02/2014           10
  BBB          BETTER          03/03/2014          03/02/2017           7
  BBB          GOOD            03/03/2017          12/02/2017           4
  CCC          OK              10/13/2008          10/12/2009           2
  CCC          BEST            10/13/2009          10/14/2014           10
  CCC          BETTER          10/15/2014          10/14/2016           7
  CCC          OK              10/15/2016          10/14/2017           2

I'm not necessarily looking to reduce rows but am looking to get the correct state of the system_id at any given point in time. Note that when a higher ranked contract ends and a lower ranked contract is still active the lower ranked picks up where the higher one left off.

We are using TD 14 and I have been able to get the easy records where the dates flow sequentially and are of higher rank but am having trouble with the overlaps where two different ranked contracts cover multiple date spans.

I found this blog post (Sharpening Stones) and got it working for the most part but am still having trouble setting the new start dates for the overlapping contracts.

Any help would be appreciated. Thanks.


*UPDATE 04/04/2014 *

I came up with the following code which gives me exactly what I want but I'm not sure of the performance. It works on smaller data sets of a few hundred rows but I havent tested it on several million:

*UPDATE 04/07/2014 * Updated the date subquery due to spool issues. This query explodes all days where the contract is possibly active and then uses the ROW_NUMBER function to get the highest ranked CONTRACT_TYPE per day. The MIN/MAX functions are then partitioned over the system and contract type to pick up when the highest ranked contract type changes.

*UPDATE - 2 - 04/07/2014 * I cleaned up the query and it seems to be perform a little better.

SELECT 
    SYSTEM_ID
,   CONTRACT_TYPE
,   MIN(CALENDAR_DATE) NEW_START_DATE
,   MAX(CALENDAR_DATE) NEW_END_DATE
,   CONTRACT_RANK
FROM (
SELECT 
    CALENDAR_DATE
,   SYSTEM_ID
,   CONTRACT_TYPE
,   CONTRACT_RANK
,   ROW_NUMBER() OVER (PARTITION BY SYSTEM_ID, CALENDAR_DATE ORDER BY CONTRACT_RANK DESC, CONTRACT_STRT_DT DESC, CONTRACT_END_DT DESC) AS RNK
FROM SOLD_SYSTEMS t1
JOIN (
    SELECT CALENDAR_DATE
    FROM FULL_CALENDAR_TABLE ia     
    WHERE CALENDAR_DATE > DATE'2013-01-01'
    )dt
ON CALENDAR_DATE BETWEEN CONTRACT_STRT_DT AND CONTRACT_END_DT
QUALIFY RNK = 1
)z1
GROUP BY 1,2,5
Was it helpful?

Solution

Following approach uses the new PERIOD functions in TD13.10.

-- 1. TD_SEQUENCED_COUNT can't be used in joins, so create a Volatile Table
-- 2. TD_SEQUENCED_COUNT can't use additional columns (e.g. CONTRACT_RANK),
--    so simply create a new row whenever a period starts or ends without
--    considering CONTRACT_RANK
CREATE VOLATILE TABLE vt AS
 (
   WITH cte
    (
      SYSTEM_ID
     ,pd
    )
   AS
    (
      SELECT
         SYSTEM_ID
-- PERIODs can easily be constructed on-the-fly, but the end date is not inclusive,
-- so I had to adjust to your implementation, CONTRACT_END_DT +/- 1:
        ,PERIOD(CONTRACT_STRT_DT, CONTRACT_END_DT + 1) AS pd
      FROM SOLD_SYSTEMS
     )
   SELECT
      SYSTEM_ID
     ,BEGIN(pd) AS CONTRACT_STRT_DT
     ,END(pd) - 1 AS CONTRACT_END_DT
   FROM
      TABLE (TD_SEQUENCED_COUNT
            (NEW VARIANT_TYPE(cte.SYSTEM_ID) 
            ,cte.pd) 
      RETURNS (SYSTEM_ID VARCHAR(5)
              ,Policy_Count INTEGER 
              ,pd PERIOD(DATE))
      HASH BY SYSTEM_ID 
      LOCAL ORDER BY SYSTEM_ID ,pd) AS dt
 )
WITH DATA 
PRIMARY INDEX (SYSTEM_ID)
ON COMMIT PRESERVE ROWS
;

-- Find the matching CONTRACT_RANK
SELECT
   vt.SYSTEM_ID
  ,t.CONTRACT_TYPE
  ,vt.CONTRACT_STRT_DT
  ,vt.CONTRACT_END_DT
  ,t.CONTRACT_RANK
FROM vt
-- If both vt and SOLD_SYSTEMS have a NUPI on SYSTEM_ID this join should be
-- quite efficient
JOIN SOLD_SYSTEMS AS t
  ON vt.SYSTEM_ID = t.SYSTEM_ID
 AND      ( t.CONTRACT_STRT_DT,  t.CONTRACT_END_DT)
 OVERLAPS (vt.CONTRACT_STRT_DT, vt.CONTRACT_END_DT)
QUALIFY 
-- As multiple contracts for the same period are possible:
-- find the row with the highest rank
   ROW_NUMBER() 
   OVER (PARTITION BY vt.SYSTEM_ID,vt.CONTRACT_STRT_DT
         ORDER BY t.CONTRACT_RANK DESC, vt.CONTRACT_END_DT DESC) = 1
ORDER BY 1,3
;

-- Previous query might return consecutive rows with the same CONTRACT_RANK, e.g.
-- BBB  BETTER  2014-03-03  2016-12-01  7
-- BBB  BETTER  2016-12-02  2017-03-02  7

-- If you don't want that you have to normalize the data:
WITH cte
 (
   SYSTEM_ID
  ,CONTRACT_STRT_DT
  ,CONTRACT_END_DT
  ,CONTRACT_RANK
  ,CONTRACT_TYPE
  ,pd
 )
AS
 (
   SELECT
      vt.SYSTEM_ID
     ,vt.CONTRACT_STRT_DT
     ,vt.CONTRACT_END_DT
     ,t.CONTRACT_RANK
     ,t.CONTRACT_TYPE
     ,PERIOD(vt.CONTRACT_STRT_DT, vt.CONTRACT_END_DT + 1) AS pd
   FROM vt
   JOIN SOLD_SYSTEMS AS t
     ON vt.SYSTEM_ID = t.SYSTEM_ID
        AND       ( t.CONTRACT_STRT_DT,  t.CONTRACT_END_DT)
         OVERLAPS (vt.CONTRACT_STRT_DT, vt.CONTRACT_END_DT)
   QUALIFY
      ROW_NUMBER() 
      OVER (PARTITION BY vt.SYSTEM_ID,vt.CONTRACT_STRT_DT
            ORDER BY t.CONTRACT_RANK DESC, vt.CONTRACT_END_DT DESC) = 1
 )
SELECT
   SYSTEM_ID
  ,CONTRACT_TYPE
  ,BEGIN(pd) AS CONTRACT_STRT_DT
  ,END(pd) - 1 AS CONTRACT_END_DT
  ,CONTRACT_RANK
FROM
   TABLE (TD_NORMALIZE_MEET
         (NEW VARIANT_TYPE(cte.SYSTEM_ID
                          ,cte.CONTRACT_RANK
                          ,cte.CONTRACT_TYPE)
         ,cte.pd) 
   RETURNS (SYSTEM_ID VARCHAR(5)
           ,CONTRACT_RANK INT
           ,CONTRACT_TYPE VARCHAR(10)
           ,pd PERIOD(DATE))
   HASH BY SYSTEM_ID 
   LOCAL ORDER BY SYSTEM_ID, CONTRACT_RANK, CONTRACT_TYPE, pd ) A
ORDER BY 1, 3;

Edit: This is another way to get the result of the 2nd query without Volatile Table and TD_SEQUENCED_COUNT:

SELECT
   t.SYSTEM_ID
  ,t.CONTRACT_TYPE
  ,BEGIN(CONTRACT_PERIOD)  AS CONTRACT_STRT_DT
  ,END(CONTRACT_PERIOD)- 1 AS CONTRACT_END_DT
  ,t.CONTRACT_RANK
  ,dt.p P_INTERSECT PERIOD(t.CONTRACT_STRT_DT,t.CONTRACT_END_DT + 1) AS CONTRACT_PERIOD
FROM
 (
   SELECT
      dt.SYSTEM_ID
     ,PERIOD(d, MIN(d)
                OVER (PARTITION BY dt.SYSTEM_ID
                      ORDER BY d
                      ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)) AS p
   FROM
    (
      SELECT
         SYSTEM_ID
        ,CONTRACT_STRT_DT AS d
      FROM SOLD_SYSTEMS
      UNION
      SELECT
         SYSTEM_ID
        ,CONTRACT_END_DT + 1 AS d
      FROM SOLD_SYSTEMS
    ) AS dt 
    QUALIFY p IS NOT NULL
 ) AS dt
JOIN SOLD_SYSTEMS AS t
  ON dt.SYSTEM_ID = t.SYSTEM_ID
WHERE CONTRACT_PERIOD IS NOT NULL
QUALIFY 
   ROW_NUMBER() 
   OVER (PARTITION BY dt.SYSTEM_ID,p
         ORDER BY t.CONTRACT_RANK DESC, t.CONTRACT_END_DT DESC) = 1
ORDER BY 1,3

And based on that you can also include the normalization in a single query:

WITH cte
 (
   SYSTEM_ID
  ,CONTRACT_TYPE
  ,CONTRACT_STRT_DT
  ,CONTRACT_END_DT
  ,CONTRACT_RANK
  ,pd
 )
AS
 (
   SELECT
      t.SYSTEM_ID
     ,t.CONTRACT_TYPE
     ,BEGIN(CONTRACT_PERIOD)  AS CONTRACT_STRT_DT
     ,END(CONTRACT_PERIOD)- 1 AS CONTRACT_END_DT
     ,t.CONTRACT_RANK
     ,dt.p P_INTERSECT PERIOD(t.CONTRACT_STRT_DT,t.CONTRACT_END_DT + 1) AS CONTRACT_PERIOD
   FROM
    (
      SELECT
         dt.SYSTEM_ID
        ,PERIOD(d, MIN(d)
                   OVER (PARTITION BY dt.SYSTEM_ID
                         ORDER BY d
                         ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)) AS p
      FROM
       (
         SELECT
            SYSTEM_ID
           ,CONTRACT_STRT_DT AS d
         FROM SOLD_SYSTEMS
         UNION
         SELECT
            SYSTEM_ID
           ,CONTRACT_END_DT + 1 AS d
         FROM SOLD_SYSTEMS
       ) AS dt 
       QUALIFY p IS NOT NULL
    ) AS dt
   JOIN SOLD_SYSTEMS AS t
     ON dt.SYSTEM_ID = t.SYSTEM_ID
   WHERE CONTRACT_PERIOD IS NOT NULL
   QUALIFY 
      ROW_NUMBER() 
      OVER (PARTITION BY dt.SYSTEM_ID,p
            ORDER BY t.CONTRACT_RANK DESC, t.CONTRACT_END_DT DESC) = 1
 )
SELECT
   SYSTEM_ID
  ,CONTRACT_TYPE
  ,BEGIN(pd) AS CONTRACT_STRT_DT
  ,END(pd) - 1 AS CONTRACT_END_DT
  ,CONTRACT_RANK
FROM 
   TABLE (TD_NORMALIZE_MEET
         (NEW VARIANT_TYPE(cte.SYSTEM_ID
                          ,cte.CONTRACT_RANK
                          ,cte.CONTRACT_TYPE)
         ,cte.pd) 
   RETURNS (SYSTEM_ID VARCHAR(5)
           ,CONTRACT_RANK INT
           ,CONTRACT_TYPE VARCHAR(10)
           ,pd PERIOD(DATE))
   HASH BY SYSTEM_ID 
   LOCAL ORDER BY SYSTEM_ID, CONTRACT_RANK, CONTRACT_TYPE, pd ) A
ORDER BY 1, 3;

OTHER TIPS

SEL system_id,contract_type,MAX(contract_rank),
CASE    WHEN contract_strt_dt<prev_end_dt THEN prev_end_dt+1 
ELSE    contract_strt_dt 
END AS new_start ,contract_strt_dt,contract_end_dt,
MIN(contract_end_dt) OVER (PARTITION BY system_id  
ORDER   BY contract_strt_dt,contract_end_dt ROWS BETWEEN 1 PRECEDING 
AND 1 PRECEDING) prev_end_dt

FROM sold_systems
GROUP BY system_id,contract_type,contract_strt_dt,contract_end_dt
ORDER   BY contract_strt_dt,contract_end_dt,prev_end_dt

I think I'v got it.... try this

select  SYSTEM_ID, CONTRACT_TYPE,CONTRACT_RANK,
case
    when CONTRACT_STRT_DT<NEW_START_DATE then NEW_START_DATE  /*if new_star_date overlap startdate then get new_Start_date */
    else CONTRACT_STRT_DT
    end as new_contract_str_dt,
     CONTRACT_END_DT
from
(select t1.SYSTEM_ID,t1.CONTRACT_TYPE,t1.CONTRACT_RANK,t1.CONTRACT_STRT_DT,t1.CONTRACT_END_DT,
 coalesce(max(t1.CONTRACT_END_DT) over (partition by t1.SYSTEM_ID  order by t1.CONTRACT_RANK desc rows between UNBOUNDED PRECEDING and 1 preceding  ), t1.CONTRACT_STRT_DT)  NEW_START_DATE
from SOLD_SYSTEMS t1 

) as  temp1
/*you may remove fully overlapped contracts*/
where  NEW_START_DATE<=CONTRACT_END_DT

It's simpler and have a good execution plan... You can work with large tables (don't forget to collect statistics )

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top