Question

I'm using Saiku 2.5 and I'm not sure how to model the following situation:

Dimentions:

  • Category (~20 rows)
  • SubCategory (~100 rows)
  • SubSubCategory (~1200 rows)
  • SubSubSubCategory (~8000 rows)
  • Other1 (~100000 rows)
  • Other2 (~50000 rows)
  • Other3 (~500 rows)
  • Other4 (~500 rows)
  • Other5 (~200 rows)
  • Other6 (~200 rows)
  • Other7 (~100 rows)
  • Other8 (~10 rows)

Measuremnts:

  • Facts (~20000000 rows)

Relationships

  • Fact has Other[\d]
  • Fact has SubSubSubCategory
  • SubSubSubCategory has SubSubCategory
  • SubSubCategory has SubCategory
  • SubCategory has Category

I'd like to know if it's better performance-wise, to de-normalize all categories into one table or leave it as it is. Each "category-like" table has a VARCHAR(8) and two TEXT().

Was it helpful?

Solution

I'd like to know if it's better performance-wise, to de-normalize all categories into one table or leave it as it is.

You're optimizing a data warehouse for read performance, so I'd opt for denormalizing the category tables. You're talking roughly 8,000 rows or a page and a half in most relational databases. You could keep this table in memory.

Seems like a star schema would work for you. If the other dimensions have relations, then a snowflake schema would be warranted.

OTHER TIPS

Your categories should be in a single table, since 8k row joins are easy.

Other should be split in a few tables. This will allow mondrian to perform the join at high levels (low cardinality) when possible and thus perform better.

Mondrian plays well with both scenarios.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top