Question

suppose there are records as follows:

Employee_id, work_start_date, work_end_date

1, 01-jan-2014, 07-jan-2014
1, 03-jan-2014, 12-jan-2014
1, 23-jan-2014, 25-jan-2014
2, 15-jan-2014, 25-jan-2014
2, 07-jan-2014, 15-jan-2014
2, 09-jan-2014, 12-jan-2014

The requirement is to write an SQL select statment which would summarize the work days grouped by employee_id, but exclude the overlapped periods (meaning - take them into calculation only once).

The desired output would be:

Employee_id, worked_days

1, 13
2, 18

The calculations for working days in the date range are done like this: If work_start_date = 5 and work_end_date = 9 then worked_days = 4 (9 - 5).

I could write a pl/sql function which solves this (manually iterating over the records and doing the calculation), but I'm sure it can be done using SQL for better performance.

Can someone please point me in the right direction?

Thanks!

Was it helpful?

Solution

This is a slightly modified query from similar question:
compute sum of values associated with overlapping date ranges

SELECT "Employee_id",
       SUM( "work_end_date" - "work_start_date" )
FROM(
  SELECT "Employee_id",
         "work_start_date" ,
         lead( "work_start_date" ) 
             over (Partition by "Employee_id"
                  Order by "Employee_id", "work_start_date" ) 
         As "work_end_date"
  FROM (
     SELECT "Employee_id", "work_start_date"
     FROM Table1
     UNION
     SELECT "Employee_id","work_end_date"
     FROM Table1
  ) x
) x
WHERE EXISTS (
   SELECT 1 FROM Table1 t
   WHERE t."work_start_date" > x."work_end_date"
     AND t."work_end_date" > x."work_start_date"
      OR t."work_start_date" = x."work_start_date"
     AND t."work_end_date" =  x."work_end_date"
)
GROUP BY "Employee_id"
;

Demo: http://sqlfiddle.com/#!4/4fcce/2

OTHER TIPS

This is a tricky problem. For instance, you can't use lag(), because the overlapping period may not be the "previous" one. Or different periods can start and or stop on the same day.

The idea is to reconstruct the periods. How to do this? Find the records where the periods start -- that is, there is no overlap with any other. Then use this as a flag and count this flag cumulatively to count overlapping groups. Then getting the working days is just aggregation from there:

with ps as (
      select e.*,
             (case when exists (select 1
                                from emps e2
                                where e2.employee_id = e.employee_id and
                                      e2.work_start_date <= e.work_start_date and
                                      e2.work_end_date >= e.work_end_date
                         )
                   then 0 else 1
            ) as IsPeriodStart
      from emps e
     )
select employee_id, sum(work_end_date - work_start_date) as Days_Worked
from (select employee_id, min(work_start_date) as work_start_date,
             max(work_end_date) as work_end_date
      from (select ps.*,
                   sum(IsPeriod_Start) over (partition by employee_id
                                             order by work_start_date
                                            ) as grp
            from ps 
           ) ps
      group by employee_id, grp
     ) ps
group by employee_id;

date_tbl type

create or replace package RG_TYPE is
  type date_tbl is table of date;
end;

function (result as a table with the dates between 2 parameters)

create or replace function dates
(
    p_from date,
    p_to date
) return rg_type.date_tbl pipelined
is
  l_idx date:=p_from;
begin
  loop
    if l_idx>nvl(p_to,p_from) then
      exit;
    end if;
    pipe row(l_idx);
    l_idx:=l_idx+1;
  end loop;
  return;
end;

SQL:

select employee_id,sum(c)
from
  (select e.employee_id,d.column_value,count(distinct w.employee_id) as c
  from   (select distinct employee_id from works) e,
         table(dates((select min(work_start_date) as a from works),(select max(work_end_date) as b from works))) d,
         works w
  where e.employee_id=w.employee_id
        and d.column_value>=w.work_start_date
        and d.column_value<w.work_end_date
  group by e.employee_id,d.column_value) Sub
group by employee_id  
order by 1,2
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top