Snowflake or Star for OLAP database design

https://stackoverflow.com/questions/19846208

29-07-2022
|

Question

I'm using Saiku 2.5 and I'm not sure how to model the following situation:

Dimentions:

Category (~20 rows)
SubCategory (~100 rows)
SubSubCategory (~1200 rows)
SubSubSubCategory (~8000 rows)
Other1 (~100000 rows)
Other2 (~50000 rows)
Other3 (~500 rows)
Other4 (~500 rows)
Other5 (~200 rows)
Other6 (~200 rows)
Other7 (~100 rows)
Other8 (~10 rows)

Measuremnts:

Facts (~20000000 rows)

Relationships

Fact has Other[\d]
Fact has SubSubSubCategory
SubSubSubCategory has SubSubCategory
SubSubCategory has SubCategory
SubCategory has Category

I'd like to know if it's better performance-wise, to de-normalize all categories into one table or leave it as it is. Each "category-like" table has a VARCHAR(8) and two TEXT().

Solution

I'd like to know if it's better performance-wise, to de-normalize all categories into one table or leave it as it is.

You're optimizing a data warehouse for read performance, so I'd opt for denormalizing the category tables. You're talking roughly 8,000 rows or a page and a half in most relational databases. You could keep this table in memory.

Seems like a star schema would work for you. If the other dimensions have relations, then a snowflake schema would be warranted.

OTHER TIPS

Your categories should be in a single table, since 8k row joins are easy.

Other should be split in a few tables. This will allow mondrian to perform the join at high levels (low cardinality) when possible and thus perform better.

Mondrian plays well with both scenarios.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow