Question

I have a query, which returns the following, EXCEPT for the last column, which is what I need to figure out how to create. For each given ObservationID I need to return the date on which the status changes; something like a LEAD() function that would take conditions and not just offsets. Can it be done?

I need to calculate the column Change Date; it should be the last date the status was not the current status.

+---------------+--------+-----------+--------+-------------+
| ObservationID | Region |   Date    | Status | Change Date | <-This field
+---------------+--------+-----------+--------+-------------+
|             1 |     10 | 1/3/2012  | Ice    | 1/4/2012    |
|             2 |     10 | 1/4/2012  | Water  | 1/6/2012    |
|             3 |     10 | 1/5/2012  | Water  | 1/6/2012    |
|             4 |     10 | 1/6/2012  | Gas    | 1/7/2012    |
|             5 |     10 | 1/7/2012  | Ice    |             |
|             6 |     20 | 2/6/2012  | Water  | 2/10/2012   |
|             7 |     20 | 2/7/2012  | Water  | 2/10/2012   |
|             8 |     20 | 2/8/2012  | Water  | 2/10/2012   |
|             9 |     20 | 2/9/2012  | Water  | 2/10/2012   |
|            10 |     20 | 2/10/2012 | Ice    |             |
+---------------+--------+-----------+--------+-------------+
Was it helpful?

Solution

a model clause (10g+) can do this in a compact way:

SQL> create table observation(ObservationID ,  Region  ,obs_date,  Status)
  2  as
  3  select  1, 10, date '2012-03-01', 'Ice' from dual union all
  4  select  2, 10, date '2012-04-01', 'Water' from dual union all
  5  select  3, 10, date '2012-05-01', 'Water' from dual union all
  6  select  4, 10, date '2012-06-01', 'Gas' from dual union all
  7  select  5, 10, date '2012-07-01', 'Ice' from dual union all
  8  select  6, 20, date '2012-06-02', 'Water' from dual union all
  9  select  7, 20, date '2012-07-02', 'Water' from dual union all
 10  select  8, 20, date '2012-08-02', 'Water' from dual union all
 11  select  9, 20, date '2012-09-02', 'Water' from dual union all
 12  select 10, 20, date '2012-10-02', 'Ice' from dual ;

Table created.

SQL> select ObservationID, obs_date, Status, status_change
  2            from observation
  3          model
  4          dimension by (Region, obs_date, Status)
  5          measures ( ObservationID, obs_date obs_date2, cast(null as date) status_change)
  6          rules (
  7            status_change[any,any,any] = min(obs_date2)[cv(Region), obs_date > cv(obs_date), status != cv(status)]
  8          )
  9   order by 1;

OBSERVATIONID OBS_DATE  STATU STATUS_CH
------------- --------- ----- ---------
            1 01-MAR-12 Ice   01-APR-12
            2 01-APR-12 Water 01-JUN-12
            3 01-MAY-12 Water 01-JUN-12
            4 01-JUN-12 Gas   01-JUL-12
            5 01-JUL-12 Ice
            6 02-JUN-12 Water 02-OCT-12
            7 02-JUL-12 Water 02-OCT-12
            8 02-AUG-12 Water 02-OCT-12
            9 02-SEP-12 Water 02-OCT-12
           10 02-OCT-12 Ice

fiddle: http://sqlfiddle.com/#!4/f6687/1

i.e. we will dimension on region, date and status as we want to look at cells with the same region, but get the first date that the status differs on.

we also have to measure date too so i created an alias obs_date2 to do that, and we want a new column status_change to hold the date the status changed.

this line is the line that does all the working out for us:

status_change[any,any,any] = min(obs_date2)[cv(Region), obs_date > cv(obs_date), status != cv(status)]

it says, for our three dimensions, only look at the rows with the same region (cv(Region),) and look at rows where the date follows the date of the current row (obs_date > cv(obs_date)) and also the status is different from the current row (status != cv(status)) finally get the minimum date that satisfies this set of conditions (min(obs_date2)) and assign it to status_change. The any,any,any part on the left means this calculation applies to all rows.

OTHER TIPS

I've tried many times to understand the MODEL clause and never really quite managed it, so thought I would add another solution

This solution takes some of what Ronnis has done but instead uses the IGNORE NULLS clause of the LEAD function. I think that this is only new with Oracle 11 but you could probably replace it with the FIRST_VALUE function for Oracle 10 if necessary.

select
  observation_id,
  region,
  observation_date,
  status,
  lead(case when is_change = 'Y' then observation_date end) ignore nulls 
    over (partition by region order by observation_date) as change_observation_date
from (
  select
    a.observation_id,
    a.region,
    a.observation_date,
    a.status,
    case 
      when status = lag(status) over (partition by region order by observation_date) 
        then null
        else 'Y' end as is_change
       from observations a
)
order by 1

I frequently do this when cleaning up overlapping from/to-dates and duplicate rows. Your case is much simpler though, since you only have the "from-date" :)

Setting up the test data

create table observations(
   observation_id   number       not null
  ,region           number       not null
  ,observation_date date         not null
  ,status           varchar2(10) not null
);


insert 
  into observations(observation_id, region, observation_date, status)
   select 1,  10, date '2012-03-01', 'Ice'   from dual union all
   select 2,  10, date '2012-04-01', 'Water' from dual union all
   select 3,  10, date '2012-05-01', 'Water' from dual union all
   select 4,  10, date '2012-06-01', 'Gas'   from dual union all
   select 5,  10, date '2012-07-01', 'Ice'   from dual union all
   select 6,  20, date '2012-06-02', 'Water' from dual union all
   select 7,  20, date '2012-07-02', 'Water' from dual union all
   select 8,  20, date '2012-08-02', 'Water' from dual union all
   select 9,  20, date '2012-09-02', 'Water' from dual union all
   select 10, 20, date '2012-10-02', 'Ice'   from dual;

commit;

The below query has three points of interest:

  1. Identifying repeated information (the recording show the same as previous recording)
  2. Ignoring the repeated recordings
  3. Determining the date from the "next" change

.

with lagged as(
   select a.*
         ,case when status = lag(status, 1) over(partition by region 
                                                     order by observation_date) 
               then null 
               else rownum 
           end as change_flag -- 1
     from observations a
)
select observation_id
      ,region
      ,observation_date
      ,status
      ,lead(observation_date, 1) over(
         partition by region 
             order by observation_date
      ) as change_date --3
      ,lead(observation_date, 1, sysdate) over(
         partition by region 
             order by observation_date
      ) - observation_date as duration
  from lagged
 where change_flag is not null -- 2
 ;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top