Whenever you design a fact table, the first set questions to ask yourself is:
- What is the business process you're analysing?
- What are relevant facts?
- What are the dimensions you'd like to analyse the facts by?
- What does the lowest (least aggregated) level of detail in the fact table represent, i.e. what is the grain of the fact table?
The process seems to be Human Resources (HR).
You already know the fact, length of employment, which you can calculate easily: EndDate - StartDate
. The obvious dimensions are Department, Employee, Date (two role-playing dimensions for Start and End).
In this case, since you're looking for 'average length of employment' as a measure, it seems that the grain should be individual Employees by Department (your transactional data may have the same EmployeeID
listed under a different DeptID
when an employee has transferred).
Your star schema will then look something like this:
Fact_HR
DeptKey EmployeeKey StartDateKey EndDateKey EmploymentLengthInDays
-------------------------------------------------------------------------
10001 000321 20100101 20120101 730
10001 000421 20100505 20130101 972
Dim_Department
DeptKey DeptID Name ... (other suitable columns)
------------------------- ...
10001 001 Sales ...
Dim_Employee
EmployeeKey EmployeeID FirstName LastName ... (other suitable columns)
---------------------------------------------- ...
000321 123 Alison Smith ...
000421 124 Anakin Skywalker ...
Dim_Date
DateKey DateValue Year Quarter Month Day ... (other suitable columns)
00000000 N/A 0 0 0 0 ...
20100101 2010-01-01 2010 1 1 1 ...
20100102 2010-01-02 2010 1 1 2 ...
... ... ... ... ... ...
(so on for every date you want to represent)
Every column that ends in Key
is a surrogate key. The fact you're interested in is EmploymentLengthInDays
, you can derive a measure Avg. Employment Length
and you would aggregate using the average across all dimensions.
Now you can ask questions like:
- Average employment length by department.
- Average employment length for employees starting in 2011, or ending in September 2010.
- Average employment length for a given employee (across each department he/she worked for).
BONUS: You can also add another measure to your cube that uses the same column, but instead has a SUM aggregator, this may be called Total Employment Length
. Across a given employee this will tell you how long the employee worked for the company, but across a department, it will tell you the total man-days that were available to that department. Just an example of how a single fact can become multiple measures.