What would be the benefits of having a time dimension in a star schema over having the time attributes in the fact table itself?

For example:

I have a transaction data with user information for each transaction, country where the transaction took place and dates of when it occured.

Option 1 Correct me if I am wrong, but this is probably the widely used approached and most recommended by many:

  • A transaction fact table containing transaction_ID (PK), user_id (FK) and country_id (FK), and date_id (FK)

  • User dimension containing user_id (PK) and the other user attributes, let's say name & phone_number.

  • Dates dimension that consists of date_id (PK), date, day, month, year, quarter.

Option 2 Something that I just thought about instead of choosing Option 1, but unsure about:

  • A transaction fact table containing transaction_ID (PK), user_id (FK) and country_id (FK), date, day, month, year, quarter.

  • User dimension containing user_id (PK) and the other user attributes, let's say name & phone_number.

What would be the benefits of having Option 1 over Option 2? I am not aware of the reasons why joining with another Date dimension would be a better option even though it is most widely used approach. Thanks a lot!

有帮助吗?

解决方案

Let me answer this question with a scenario starting with a simple Transaction table. When our business started, management wanted to know the 'name' of the month, so I've included that information in the table.

DECLARE @Transactions TABLE (
    TransactionId INT
    ,UserId VARCHAR(10)
    ,CountryId INT
    ,TransactionDate DATE
    ,[MonthName] VARCHAR(20)
    ,SalesAmount DECIMAL(18, 2)
    )

Business has been good and we already have 1 million rows in our Transactions table. In fact, business is so good that management is now asking more in depth questions about our sales. They wanted to know what 'quarter' the sale was made.

ALTER TABLE Transactions ADD [QuarterName] VARCHAR(10)
UPDATE Transactions SET QuarterName = ... 

We just updated 1 million rows.

As time goes by, management starts asking more and more questions about our sales.

  • What DayOfTheWeek was that sale made?
  • Was that a holiday?
  • Was the moon full on that day?

ALTER TABLE Transaction ADD ...

UPDATE TABLE SET ...

Hopefully you can see where this is going. Additionally, all of that redundant data on each and every Transaction row can contribute to reduced performance and increase resource utilization (memory, disk space, etc.). Our databases are bigger and take longer to back up. All of the redundant data takes up memory.

With a Date Dimension table, all of that information is stored in one place. A Date Dimension table with dates from 2000-01-01 to 2100-01-01 contains just 36525 rows. Anytime we want to track a new attribute of a date, we only have to alter that table by adding the additional attribute and update 36525 rows.

When we want specific information about the 'Date' attributes of a sale, we simply join up against the Date Dimension table

Additionally, the data in a Date Dimension is consistent. January is spelled correctly, Saturday is spelled correctly, etc. Storing this kind of data in the Transaction table can lead to all kinds of discrepancies with incorrect spellings, etc.

For more information on the creation of a Date Dimension table, check out Creating a date dimension or calendar table in SQL Server

其他提示

There are several benefits to having a time dimension in your star. These benefits do not apply to all use cases, but they may apply to yours.

First, many stars benefit by keeping the fact table skinny. If most of the dimensions have a few hundred rows, but the fact table has billions of rows, anything you can do to keep the fact table skinny will not only reduce your storage requirements, but may speed up many of your retrievals. The ones that don't use any of the time attributes will generally run faster. Even the ones that do use those attributes may run about the same in spite of the extra join with the time dimension.

The big advantage is flexibility and ease of management. A date attribute like day of the week is probably going to duplicate a function built into the DBMS. But it may simplify some kind of automated report generator if it can just access weekday as an attribute, like any other attribute.

Company specific attributes, like fiscal quarter, or working day, can really benefit by being stored as attributes, depending on how quirky the company's calendar is. I once had to generate a reporting database for a company that had truly strange algorithms for determining where the fiscal quarters ended and began. By having a single program with all the calendar quirks coded into it, and by using that program to populate the time dimension, it made the rest of the system a lot simpler and easier to understand.

You can choose a different granularity than the day, if you want. In my case, we chose the work shift. A work shift was 8 hours long, and there were 3 work shifts in a day. Some of the fact tables needed only the date for granularity, but others needed the date and shift within date.

This answer duplicates my previous answer, found here

许可以下: CC-BY-SA归因
不隶属于 dba.stackexchange
scroll top