Question

I have a data mart which only needs to capture a serial number of a product, the date of the activity, and where the activity took place (which account).

There are five possible activities. The issue I have is this. Two of the activities take place at a warehouse level. The remaining three take place at the account-level (WH does not apply). Ultimately however every warehouse rolls up to a master account.

So if I had one fact table, I would essentially need two FK and you would have to traverse the fact table to build the WH > Account hierarchy which seems hard to maintain. I'd like one dimension table.

Or is it then recommended I split this into two fact tables, even though the only different characteristic of either table is whether the activity took place at the warehouse or not.

The goal of the reporting will be at the account level, but having the WH information may be useful at some point. And I need to check for duplicates, etc which is why I was leaning towards the first, but don't know how to appropriately handle the hierarchies.

Single Fact Table Design

  • Item: 1
  • Account: 14
  • Warehouse:2
  • ActivityType:3
  • Date: 20130204
  • SerialNumber:123456
  • Count:1

Dual Fact Table Design

Table 1

  • Item: 1
  • Warehouse:2
  • ActivityType:3
  • Date: 20130204
  • SerialNumber:123456
  • Count:1

Table 2

  • Item: 1
  • Account:2
  • ActivityType:3
  • Date: 20130204
  • SerialNumber:123456
  • Count:1
Was it helpful?

Solution

Ive interpreted you situation as:

  • ALL activities require an account
  • Some activities involve a warehouse.
  • The selection of warehouse implies an account. the accounts mentioned in the two point above are of the same type (there is only 1 account dimension table)

In which case you should be OK with the single FACT table design:

[ACTIVITY_FACT]
SK                    (Optional, i find unique surrogate PKs useful)
ITEM_SK               (Link to your ITEM_DIM table)
ACCOUNT_SK            (Link to your ACCOUNT_DIM table)
WAREHOUSE_SK          (Link to your WAREHOUSE_DIM table, -1 for no warehouse activities)
ACTIVITY_TYPE_SK      (Link to your ACTIVITY_TYPE_DIM table) 
ACTIVITY_DATE_SK      (Link to your DATE_DIM table)
ITEM_SERIAL_NUMBER
ITEM_COUNT

Have a record in your WAREHOUSE dimension for NONE or NOT APPLICABLE and allocate it a nice obvious special condition SK value of -1 or -9 or whatever your shop is using for such things.

For activity records that reference a warehouse, put the appropriate warehouse sk AND the account sk that belong to that warehouse.

For activities that do not involve a warehouse, populate the warehouse sk with the NONE / NOT APPLICABLE warehouse dimension record and the appropriate Account SK.

Now your fact table can be joined to your Account and Warehouse dimension tables without having to worry about outer join or null condition handling. This should allow you and your users to play about with warehouse dimension data as required and your not having to faff about with managing two tables that contain essentially the same date.

OTHER TIPS

A possibility is to define the hierarchy in a single dimension table. Guessing at what you’re dealing with, I came up with the following.

Outline of dimension table:

TABLE: Account

Account_ID  <surrogate key>
Account     <Account name, identifier>
Warehouse   (Warehouse name, identifier)

Sample data:

Account_ID   Account   Warehouse
    1           A        n/a
    2           B        n/a
    3           C        n/a
    4           W        wh1
    5           W        wh2
    6           Z        wh3
    7           Z        n/a

Account_ID is just a surrogate key, having no intrinsic meaning or value

Account lists the accounts. Here, I shows five, A, B, C, W and Z. Select distinct to get the list of accounts; join to a fact table by Account_ID where Account = “W” gets all data for that account (for however many warehouses, if applicable).

Warehouse lists all warehouses and the account they are associated with; here, “W” is the account for two separate warehouses (wh1, wh2); Z is associated with warehouse wh3, but could also be used by a fact table with “no” warehouse. Join to a fact table by Account_ID where Warehouse = “wh1” gets all data for that warehouse.

Using this, with Account_ID in a fact table you could drill down for all entries for any given Account or for a specific warehouse (or for no warehouse, if there is value in that).

There are lots of variations and permutations possible with this kind of approach.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top