Should all dimension values be used in the fact table?

https://stackoverflow.com/questions/16529472

29-05-2022
|

Question

Im modeling a data warehouse which has 6 dimension. One of these dimensions is client, which has around 600k rows, and some others such as accounts and products. I made an estimation of the number of rows of the fact table by multiplying the cardinality of each dimension table, giving 1*10^12 rows as result. My question is, if a client doesnt have a certain product, is there gonna be a row for that product (with cero value in the fact table), or there wont be a row at all? I need this information to know if my aproximation will be a upper boundary for the number of rows, or will be the exact number of rows.

Solution

You do not need an entry for each dimension combination.

Typically a fact table (or cube) will be very small compared to the theoritical size (i.e., the multiplication of the dimension's cardinalities). This theoritical number of rows (or cells) can be very very large even with a relative small number of dimensions (e.g., time, products, geography, customers, sales, etc...). This is known as cube's sparsity; OLAP engines (e.g., icCube, SSAS, etc...) are typically built to handle efficiently this sparsity.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow