Question

I can't seem to find this simple fact anywhere.

I have a fact table like such in SQL server.

fact_picked
Emp_Name     Date     Apples_Picked
John        May 1 17    100

And a type 2 dimension table like this

dim_company
Emp_Name   Company     Effective_Since
John      Blue_Apples      June 1 2015
John      Apple_N_Stuff    Jan 1 2016
John      Da_Big_Apple     March 17 2017
John      Big_Tech         October 20 2017

How would I join the fact to the dimension table so I know which "company" picked the 100 apples?

In this case, logically, given the data, it's 'Da_Big_Apple' ... since John began working there in March 17 until Oct 2017, in which his apple picking task took place.

How do I do a join of these tables though? (assuming thousands of records).

I just get stuck. I know I should do something like

Select fp.emp_name, fp.date, fp.apples_picked
from fact_picked fp
left join dim_company dc
on fp.emp_name = dc.emp_name
and fp.date > dc.effective_since .... ???

I guess I'm not really sure. I guess I can use a view to convert the type 2 to a type 4 (with an end date in the table). Then it's simpler. Make sure the fact date is greater than the start date, yet before the end date. But is that the most elegant solution?

Was it helpful?

Solution

Setup some sample data:

create table fact_picked
(Emp_Name         varchar(30)
,[Date]           date
,Apples_Picked    int);

insert into fact_picked values
('John','10/17/15',175),
('John','05/01/17',100);

create table dim_company
(Emp_Name         varchar(30)
,Company          varchar(30)
,Effective_Since  date);

insert into dim_company values
('John','Blue_Apples',  '06/01/2015'),
('John','Apple_N_Stuff','01/01/2016'),
('John','Da_Big_Apple', '03/17/2017'),
('John','Big_Tech',     '10/20/2017');

Proposed solution:

select fp.Emp_Name, 
       fp.[Date],
       fp.Apples_Picked,
       dc.Company,
       dc.Effective_Since

from   fact_picked fp

left
join   dim_company dc
on     dc.Emp_Name        = fp.Emp_Name
and    dc.Effective_Since = (select max(Effective_Since)
                             from   dim_company dc2
                             where  dc2.Emp_Name         = fp.Emp_Name
                             and    dc2.Effective_Since <= fp.[Date])

order by 1,2,5;

Emp_Name | Date                | Apples_Picked | Company      | Effective_Since    
-------- | ------------------- | ------------- | ------------ | -------------------
John     | 17/10/2015 00:00:00 |           175 | Blue_Apples  | 01/06/2015 00:00:00
John     | 01/05/2017 00:00:00 |           100 | Da_Big_Apple | 17/03/2017 00:00:00
  • sub-query is used to find the dim_company record with the latest (ie, max()) Effective_Since date that is less than a given fact_picked.[Date]
  • use the sub-query result to determine which dim_company record to join with in the top-level query

To make sure the left join is working properly, assume we have only the following rows in the dim_company table:

truncate table dim_company;
insert into dim_company values
('John','Da_Big_Apple','03/17/2017'),
('John','Big_Tech',    '10/20/2017');

Run the proposed query again:

select fp.Emp_Name, 
       fp.[Date],
       fp.Apples_Picked,
       dc.Company,
       dc.Effective_Since

from   fact_picked fp

left
join   dim_company dc
on     dc.Emp_Name        = fp.Emp_Name
and    dc.Effective_Since = (select max(Effective_Since)
                             from   dim_company dc2
                             where  dc2.Emp_Name         = fp.Emp_Name
                             and    dc2.Effective_Since <= fp.[Date])

order by 1,2,5;

Emp_Name | Date                | Apples_Picked | Company      | Effective_Since    
-------- | ------------------- | ------------- | ------------ | -------------------
John     | 17/10/2015 00:00:00 |           175 | null         | null               
John     | 01/05/2017 00:00:00 |           100 | Da_Big_Apple | 17/03/2017 00:00:00

And here's a dbfiddle for the above.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top