Join Fact table to SCD Type 2 … how to write query? (SQL Server)
-
08-10-2020 - |
Question
I can't seem to find this simple fact anywhere.
I have a fact table like such in SQL server.
fact_picked
Emp_Name Date Apples_Picked
John May 1 17 100
And a type 2 dimension table like this
dim_company
Emp_Name Company Effective_Since
John Blue_Apples June 1 2015
John Apple_N_Stuff Jan 1 2016
John Da_Big_Apple March 17 2017
John Big_Tech October 20 2017
How would I join the fact to the dimension table so I know which "company" picked the 100 apples?
In this case, logically, given the data, it's 'Da_Big_Apple' ... since John began working there in March 17 until Oct 2017, in which his apple picking task took place.
How do I do a join of these tables though? (assuming thousands of records).
I just get stuck. I know I should do something like
Select fp.emp_name, fp.date, fp.apples_picked
from fact_picked fp
left join dim_company dc
on fp.emp_name = dc.emp_name
and fp.date > dc.effective_since .... ???
I guess I'm not really sure. I guess I can use a view to convert the type 2 to a type 4 (with an end date in the table). Then it's simpler. Make sure the fact date is greater than the start date, yet before the end date. But is that the most elegant solution?
Solution
Setup some sample data:
create table fact_picked
(Emp_Name varchar(30)
,[Date] date
,Apples_Picked int);
insert into fact_picked values
('John','10/17/15',175),
('John','05/01/17',100);
create table dim_company
(Emp_Name varchar(30)
,Company varchar(30)
,Effective_Since date);
insert into dim_company values
('John','Blue_Apples', '06/01/2015'),
('John','Apple_N_Stuff','01/01/2016'),
('John','Da_Big_Apple', '03/17/2017'),
('John','Big_Tech', '10/20/2017');
Proposed solution:
select fp.Emp_Name,
fp.[Date],
fp.Apples_Picked,
dc.Company,
dc.Effective_Since
from fact_picked fp
left
join dim_company dc
on dc.Emp_Name = fp.Emp_Name
and dc.Effective_Since = (select max(Effective_Since)
from dim_company dc2
where dc2.Emp_Name = fp.Emp_Name
and dc2.Effective_Since <= fp.[Date])
order by 1,2,5;
Emp_Name | Date | Apples_Picked | Company | Effective_Since
-------- | ------------------- | ------------- | ------------ | -------------------
John | 17/10/2015 00:00:00 | 175 | Blue_Apples | 01/06/2015 00:00:00
John | 01/05/2017 00:00:00 | 100 | Da_Big_Apple | 17/03/2017 00:00:00
- sub-query is used to find the dim_company record with the latest (ie, max()) Effective_Since date that is less than a given fact_picked.[Date]
- use the sub-query result to determine which dim_company record to join with in the top-level query
To make sure the left join
is working properly, assume we have only the following rows in the dim_company table:
truncate table dim_company;
insert into dim_company values
('John','Da_Big_Apple','03/17/2017'),
('John','Big_Tech', '10/20/2017');
Run the proposed query again:
select fp.Emp_Name,
fp.[Date],
fp.Apples_Picked,
dc.Company,
dc.Effective_Since
from fact_picked fp
left
join dim_company dc
on dc.Emp_Name = fp.Emp_Name
and dc.Effective_Since = (select max(Effective_Since)
from dim_company dc2
where dc2.Emp_Name = fp.Emp_Name
and dc2.Effective_Since <= fp.[Date])
order by 1,2,5;
Emp_Name | Date | Apples_Picked | Company | Effective_Since
-------- | ------------------- | ------------- | ------------ | -------------------
John | 17/10/2015 00:00:00 | 175 | null | null
John | 01/05/2017 00:00:00 | 100 | Da_Big_Apple | 17/03/2017 00:00:00
And here's a dbfiddle for the above.