Question

I have two tables: a schedule table that contains information about how an employee is scheduled and a numbers table in which each number corresponds to a date.

The tables look like:

[Employee Schedule]

ID          Employee ID Project ID  Day ID
----------- ----------- ----------- -----------
1           64          2           168
2           64          2           169
3           64          2           170
4           64          2           171
5           64          1           169
6           64          1           170
7           64          1           171
8           64          1           172
9           64          2           182
10          64          2           183
11          64          2           184

and

[Day Numbers]

ID          Day
----------- ----------
168         2009-06-18
169         2009-06-19
170         2009-06-20
171         2009-06-21
172         2009-06-22
173         2009-06-23
174         2009-06-24
175         2009-06-25
176         2009-06-26
177         2009-06-27
178         2009-06-28
179         2009-06-29
180         2009-06-30
181         2009-07-01
182         2009-07-02
183         2009-07-03
184         2009-07-04

As you can see, Employee 64 is scheduled on project 1 from 2009-06-19 to 2009-06-22 and project 2 from 2009-06-18 to 2009-06-21 and again from 2009-07-02 to 2009-07-04.

My question is: what algorithm can I use to quickly determine the spans of the employee's schedule in a fashion such that I can display it as follows?

 Employee ID Project ID Duration
 ----------- ---------- ------------
 64          1          2009-06-19 to 2009-06-22
 64          2          2009-06-18 to 2009-06-21
 64          2          2009-07-02 to 2009-07-04

I can do this on the SQL side or the code side. I have Linq at my disposal if I need it. The table doesn't need to be compiled by SQL. This will happen dynamically on a website and should be as efficient as possible. I don't want to have to iterate through each and look for breaks in contiguous days if I don't have to.

Was it helpful?

Solution

Assuming the Day IDs are always sequential for a partial solution...

select *
  from employee_schedule a                    
 where not exists( select *                          
                     from employee_schedule b        
                    where a.employeeid = b.employeeid
                      and a.projectid  = b.projectid 
                      and (a.dayid - 1) = b.dayid )

lists the start day IDs:

 ID      EMPLOYEEID       PROJECTID           DAYID 
 1              64               2             168 
 5              64               1             169 
 9              64               2             182 



select *
  from employee_schedule a                   
 where not exists( select *                         
                     from employee_schedule b       
                    where a.employeeid = b.employeei
                      and a.projectid  = b.projectid
                      and (a.dayid + 1) = b.dayid )

lists the end day IDs:

  ID      EMPLOYEEID       PROJECTID           DAYID 
  4              64               2             171 
  8              64               1             172 
 11              64               2             184 

OTHER TIPS

Lets make a view to make things easier:

create view EmployeeProjectDates
as
select
    e.[Employee ID], e.[Project ID], d.Day
from
    [Employee Scchedule] e
    join [Day Numbers] d on e.[Day Id] = d.ID

You can do a query like this to get all the start dates:

select
    one.[Employee ID], one.[Project ID], one.Day as StartDate
from
    EmployeeProjectDays one
    left join EmployeeProjectDays two on one.[Employee ID] = two.[Employee ID] and one.[Project ID] = two.[Project ID] and one.Day = DATEADD(DAY, 1, two.Day)
where
    two.Day is null

And then do a similar query to get the end dates and match them up. I think that something like this would get you both.

select
    one.[Employee ID], one.[Project ID], one.Day as StartDate,
    (select
        min(two_end.Day)
    from
        EmployeeProjectDays one_end
        join EmployeeProjectDays two_end on one_end.[Employee ID] = two_end.[Employee ID] and one_end.[Project ID] = two_end.[Project ID] and one.Day = DATEADD(DAY, 1, two.Day)
     where
        one_end.Day is null
        and two_end.Day > one.Day) as EndDate
from
    EmployeeProjectDays one
    left join EmployeeProjectDays two on one.[Employee ID] = two.[Employee ID] and one.[Project ID] = two.[Project ID] and one.Day = DATEADD(DAY, 1, two.Day)
where
    two.Day is null

I haven't tested any of these queries, but something similar should work. I had to use a similar query before we implemented something in our application code to find the start and end dates.

This one works with oracle, and starting from that it should be possible in SQL Server as well. (including testscript)

create table schedule (id number, employee_id number, project_id number, day_id number);

insert into schedule (id, employee_id, project_id, day_id)
values(1,64,2,168);
insert into schedule (id, employee_id, project_id, day_id)
values(2,64,2,169);
insert into schedule (id, employee_id, project_id, day_id)
values(3,64,2,170);
insert into schedule (id, employee_id, project_id, day_id)
values(4,64,2,171);
insert into schedule (id, employee_id, project_id, day_id)
values(5,64,1,169);
insert into schedule (id, employee_id, project_id, day_id)
values(6,64,1,170);
insert into schedule (id, employee_id, project_id, day_id)
values(7,64,1,171);
insert into schedule (id, employee_id, project_id, day_id)
values(8,64,1,172);
insert into schedule (id, employee_id, project_id, day_id)
values(9,64,2,182);
insert into schedule (id, employee_id, project_id, day_id)
values(10,64,2,183);
insert into schedule (id, employee_id, project_id, day_id)
values(11,64,2,184);
insert into schedule (id, employee_id, project_id, day_id)
values(11,65,3,184);

select * 
FROM (
    select  
        employee_id,
        project_id,
        first_day,
        nvl(last_day, 
            lead(last_day) over (
                partition by employee_id, project_id 
                order by nvl(first_day, last_day)
            )
        ) last_day
    from (
        select -- this identifies start and end rows of an interval
            employee_id,
            project_id,
            decode (day_id - prev_day, 1, null, day_id) first_day, -- uses day_id, if prev_day is not really the previous day, i.e. a gap or null
            decode (day_id - next_day, -1, null, day_id) last_day
        from (
            select -- this select adds columns for the previous and next day, in order to identify the boundaries of intervals 
                employee_id, 
                project_id, 
                day_id, 
                lead(day_id) over ( 
                    partition by employee_id, project_id 
                    order by day_id
                ) next_day,
                lag(day_id) over ( 
                    partition by employee_id, project_id 
                    order by day_id
                ) prev_day
            from schedule
        )
    )
    where first_day is not null 
    or last_day is not null-- just filter the rows, that represent start or end dates
) 
where first_day is not null

produces this output:

64  1   169 172
64  2   168 171
64  2   182 184
65  3   184 184

I haven't tested, but try:

select [Employee ID], [Project ID], start + ' to ' + end
from (
    select s.[Employee ID], s.[Project ID], min(d.Day) start, max(d.Day) end
    from [Employee Schedule] s
    inner join [Day Numbers] d on s.[Day ID] = d.[Day ID]
    group by s.[Employee ID], s.[Project ID]
) a

Edit: corrected some column names

For easier querying, I recommend you refactor the schema to:

[EmployeeSchedule]

ID 
EmployeeID 
ProjectID  
StartDate 
EndDate

and get rid of Day Numbers completely. That will make your queries simpler, more efficient, and will allow you to have records with NULL StartDates or EndDates if you wish.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top