Finding Start and End Dates from Date Numbers Table (Date Durations)
Question
I have two tables: a schedule table that contains information about how an employee is scheduled and a numbers table in which each number corresponds to a date.
The tables look like:
[Employee Schedule]
ID Employee ID Project ID Day ID
----------- ----------- ----------- -----------
1 64 2 168
2 64 2 169
3 64 2 170
4 64 2 171
5 64 1 169
6 64 1 170
7 64 1 171
8 64 1 172
9 64 2 182
10 64 2 183
11 64 2 184
and
[Day Numbers]
ID Day
----------- ----------
168 2009-06-18
169 2009-06-19
170 2009-06-20
171 2009-06-21
172 2009-06-22
173 2009-06-23
174 2009-06-24
175 2009-06-25
176 2009-06-26
177 2009-06-27
178 2009-06-28
179 2009-06-29
180 2009-06-30
181 2009-07-01
182 2009-07-02
183 2009-07-03
184 2009-07-04
As you can see, Employee 64 is scheduled on project 1 from 2009-06-19 to 2009-06-22 and project 2 from 2009-06-18 to 2009-06-21 and again from 2009-07-02 to 2009-07-04.
My question is: what algorithm can I use to quickly determine the spans of the employee's schedule in a fashion such that I can display it as follows?
Employee ID Project ID Duration
----------- ---------- ------------
64 1 2009-06-19 to 2009-06-22
64 2 2009-06-18 to 2009-06-21
64 2 2009-07-02 to 2009-07-04
I can do this on the SQL side or the code side. I have Linq at my disposal if I need it. The table doesn't need to be compiled by SQL. This will happen dynamically on a website and should be as efficient as possible. I don't want to have to iterate through each and look for breaks in contiguous days if I don't have to.
Solution
Assuming the Day IDs are always sequential for a partial solution...
select *
from employee_schedule a
where not exists( select *
from employee_schedule b
where a.employeeid = b.employeeid
and a.projectid = b.projectid
and (a.dayid - 1) = b.dayid )
lists the start day IDs:
ID EMPLOYEEID PROJECTID DAYID
1 64 2 168
5 64 1 169
9 64 2 182
select *
from employee_schedule a
where not exists( select *
from employee_schedule b
where a.employeeid = b.employeei
and a.projectid = b.projectid
and (a.dayid + 1) = b.dayid )
lists the end day IDs:
ID EMPLOYEEID PROJECTID DAYID
4 64 2 171
8 64 1 172
11 64 2 184
OTHER TIPS
Lets make a view to make things easier:
create view EmployeeProjectDates
as
select
e.[Employee ID], e.[Project ID], d.Day
from
[Employee Scchedule] e
join [Day Numbers] d on e.[Day Id] = d.ID
You can do a query like this to get all the start dates:
select
one.[Employee ID], one.[Project ID], one.Day as StartDate
from
EmployeeProjectDays one
left join EmployeeProjectDays two on one.[Employee ID] = two.[Employee ID] and one.[Project ID] = two.[Project ID] and one.Day = DATEADD(DAY, 1, two.Day)
where
two.Day is null
And then do a similar query to get the end dates and match them up. I think that something like this would get you both.
select
one.[Employee ID], one.[Project ID], one.Day as StartDate,
(select
min(two_end.Day)
from
EmployeeProjectDays one_end
join EmployeeProjectDays two_end on one_end.[Employee ID] = two_end.[Employee ID] and one_end.[Project ID] = two_end.[Project ID] and one.Day = DATEADD(DAY, 1, two.Day)
where
one_end.Day is null
and two_end.Day > one.Day) as EndDate
from
EmployeeProjectDays one
left join EmployeeProjectDays two on one.[Employee ID] = two.[Employee ID] and one.[Project ID] = two.[Project ID] and one.Day = DATEADD(DAY, 1, two.Day)
where
two.Day is null
I haven't tested any of these queries, but something similar should work. I had to use a similar query before we implemented something in our application code to find the start and end dates.
This one works with oracle, and starting from that it should be possible in SQL Server as well. (including testscript)
create table schedule (id number, employee_id number, project_id number, day_id number);
insert into schedule (id, employee_id, project_id, day_id)
values(1,64,2,168);
insert into schedule (id, employee_id, project_id, day_id)
values(2,64,2,169);
insert into schedule (id, employee_id, project_id, day_id)
values(3,64,2,170);
insert into schedule (id, employee_id, project_id, day_id)
values(4,64,2,171);
insert into schedule (id, employee_id, project_id, day_id)
values(5,64,1,169);
insert into schedule (id, employee_id, project_id, day_id)
values(6,64,1,170);
insert into schedule (id, employee_id, project_id, day_id)
values(7,64,1,171);
insert into schedule (id, employee_id, project_id, day_id)
values(8,64,1,172);
insert into schedule (id, employee_id, project_id, day_id)
values(9,64,2,182);
insert into schedule (id, employee_id, project_id, day_id)
values(10,64,2,183);
insert into schedule (id, employee_id, project_id, day_id)
values(11,64,2,184);
insert into schedule (id, employee_id, project_id, day_id)
values(11,65,3,184);
select *
FROM (
select
employee_id,
project_id,
first_day,
nvl(last_day,
lead(last_day) over (
partition by employee_id, project_id
order by nvl(first_day, last_day)
)
) last_day
from (
select -- this identifies start and end rows of an interval
employee_id,
project_id,
decode (day_id - prev_day, 1, null, day_id) first_day, -- uses day_id, if prev_day is not really the previous day, i.e. a gap or null
decode (day_id - next_day, -1, null, day_id) last_day
from (
select -- this select adds columns for the previous and next day, in order to identify the boundaries of intervals
employee_id,
project_id,
day_id,
lead(day_id) over (
partition by employee_id, project_id
order by day_id
) next_day,
lag(day_id) over (
partition by employee_id, project_id
order by day_id
) prev_day
from schedule
)
)
where first_day is not null
or last_day is not null-- just filter the rows, that represent start or end dates
)
where first_day is not null
produces this output:
64 1 169 172
64 2 168 171
64 2 182 184
65 3 184 184
I haven't tested, but try:
select [Employee ID], [Project ID], start + ' to ' + end
from (
select s.[Employee ID], s.[Project ID], min(d.Day) start, max(d.Day) end
from [Employee Schedule] s
inner join [Day Numbers] d on s.[Day ID] = d.[Day ID]
group by s.[Employee ID], s.[Project ID]
) a
Edit: corrected some column names
For easier querying, I recommend you refactor the schema to:
[EmployeeSchedule]
ID
EmployeeID
ProjectID
StartDate
EndDate
and get rid of Day Numbers completely. That will make your queries simpler, more efficient, and will allow you to have records with NULL StartDates or EndDates if you wish.