Question

I'm designing the Fact and Dimension tables for a data warehouse currently using SQL Server, SSIS, and SSAS. Will I get any real benefit from programming the relationships between the dimensions and the fact tables into SQL? Or am I better off just defining the relationships manually when it comes time to create the cubes?

It seems to be easier to load and transform the data if I do not have any constraints upon my data insertion into the tables and therefore leave out the relationships.

Was it helpful?

Solution

I am interpreting "programming the relationships" as meaning to put foreign key constraints on the tables.

No, in a data warehouse you should not impose primary key or foreign key constraints on the fact tables.

You've mentioned some issues, and another problem is that these constraints place a performance overhead when inserting rows, which will make the ETL process more expensive.

To someone only experienced with transactional database design, this might go against everything they've learnt and experienced. Foreign key constraints are vital for databases where you have multiple processes modifying data at the same time. There's a definite risk of two processes screwing up the data somehow, in spite of the best efforts of developers. The constraints are a fundamentally vital safety net.

In a dimensional model, the database is only ever populated by the one ETL process, and in a highly controlled way. This significantly decreases the risk of the data getting corrupted, to the point where the extra cost of constraints just aren't worth it.

OTHER TIPS

I think we need to have FK Constraints, since updates to DW is 'mostly' controlled but not always. For example, manual data fixes happen in case of any data issues and such. [Ideally this must not happen, but....:)]

To ensure, the Keys dont impact performance, we can disable them before load and enable them again. This may give us the confidence that the data is right and also remove any performance issues during the load. Another thing to remember is, processing times are not a major constraint for most Data Warehouses.

If you consider the time needed to fix potential data integrity issues, having FK is well worth.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top