Question

Working on a data warehouse and am looking for suggestions on having numerous dimensions versus on large dimension with attributes.

We currently have DimEntity, DimStation, DimZone, DimGroup, DimCompany and have multiple fact tables that contain the keys from each of the dimensions. Is this the best way or would it be better to have just one dimension, DimEntity and include station, zone, group and company as attributes of the entity?

We have already gone the route of separate dimensions with our ETL so it isn't like the work to populate and build out the star schema is an issue. Performance and maintainability are important. These dimensions do not change often so looking for guidance on the best way to handle such dimensions.

Fact tables have over 100 million records. The entity dimension has around 1000 records and the others listed have under 200 each.

Was it helpful?

Solution

Without knowing your star schema table definitions, data cardinality, etc, it's tough to give a yes or no. It's going to be a balancing act.

For read performance, the fact table should be as skinny as possible and the dimension should be as short (low row count) as possible. Consolidating dimensions typically means that the fact table gets skinnier while the dimension record count increases.

If you can consolidate dimensions without adding a significant number of rows to the consolidated dimension, it may be worth looking into. It may be that you can combine the low cardinality dimensions into a junk dimension and achieve a nice balance. Dimensions with high cardinality attributes shouldn't be consolidated.

Here's a good Kimball University article on dimensional modeling. Look specifically where he addresses centipede fact tables and how he recommends using junk dimensions.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top