Question

My goal is to build a fact table which would be used to derive measures in SSAS. The measure I am building is 'average length of employment'. The measure will be deployed in a dashboard and the users will have the ability to select a calendar period and drill-down into month, week and days.

This is what the transactional data looks like :

DeptID     EmployeeID   StartDate   EndDate
--------------------------------------------
001        123           20100101   20120101   
001        124           20100505   20130101

What fields should my Fact Table have? on what fields should I be doing the aggregation? How about averaging it? Any kind of help is appreciated.

Was it helpful?

Solution

Whenever you design a fact table, the first set questions to ask yourself is:

  1. What is the business process you're analysing?
  2. What are relevant facts?
  3. What are the dimensions you'd like to analyse the facts by?
  4. What does the lowest (least aggregated) level of detail in the fact table represent, i.e. what is the grain of the fact table?

The process seems to be Human Resources (HR).

You already know the fact, length of employment, which you can calculate easily: EndDate - StartDate. The obvious dimensions are Department, Employee, Date (two role-playing dimensions for Start and End).

In this case, since you're looking for 'average length of employment' as a measure, it seems that the grain should be individual Employees by Department (your transactional data may have the same EmployeeID listed under a different DeptID when an employee has transferred).

Your star schema will then look something like this:

Fact_HR
DeptKey   EmployeeKey   StartDateKey   EndDateKey  EmploymentLengthInDays
-------------------------------------------------------------------------
10001     000321         20100101       20120101    730
10001     000421         20100505       20130101    972

Dim_Department
DeptKey   DeptID   Name   ... (other suitable columns)
------------------------- ...
10001     001      Sales  ...

Dim_Employee
EmployeeKey   EmployeeID  FirstName  LastName  ... (other suitable columns)
---------------------------------------------- ...
000321        123         Alison     Smith     ...
000421        124         Anakin     Skywalker ...

Dim_Date
DateKey    DateValue  Year  Quarter Month Day ... (other suitable columns)
00000000   N/A        0     0       0     0   ...
20100101   2010-01-01 2010  1       1     1   ...
20100102   2010-01-02 2010  1       1     2   ...
...        ...              ...     ...   ... ... 
(so on for every date you want to represent)

Every column that ends in Key is a surrogate key. The fact you're interested in is EmploymentLengthInDays, you can derive a measure Avg. Employment Length and you would aggregate using the average across all dimensions.

Now you can ask questions like:

  • Average employment length by department.
  • Average employment length for employees starting in 2011, or ending in September 2010.
  • Average employment length for a given employee (across each department he/she worked for).

BONUS: You can also add another measure to your cube that uses the same column, but instead has a SUM aggregator, this may be called Total Employment Length. Across a given employee this will tell you how long the employee worked for the company, but across a department, it will tell you the total man-days that were available to that department. Just an example of how a single fact can become multiple measures.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top